Operations | Monitoring | ITSM | DevOps | Cloud

Top Seven E-commerce Platforms in 2020

The introduction of e-commerce stores has made life so easy for the people. It does not make a difference if you are the consumer or a seller. For a seller, it provides the opportunity to express the worth of their brand and product(s). For a consumer, it gives them an all in one platform, where they can shop for multiple categories. With this, the most significant ease for both parties is to opt for e-commerce business is that you can do all this without taking a step out of their house.

Logging Best Practices Part 2: General Best Practices

Isn’t all logging pretty much the same? Logs appear by default, like magic, without any further intervention by teams other than simply starting a system… right? While logging may seem like simple magic, there’s a lot to consider. Logs don’t just automatically appear for all levels of your architecture, and any logs that do automatically appear probably don’t have all of the details that you need to successfully understand what a system is doing.

BIRCH for Anomaly Detection with InfluxDB

In this tutorial, we’ll use the BIRCH (balanced iterative reducing and clustering using hierarchies) algorithm from scikit-learn with the ADTK (Anomaly Detection Tool Kit) package to detect anomalous CPU behavior. We’ll use the InfluxDB 2.0 Python Client to query our data in InfluxDB 2.0 and return it as a Pandas DataFrame. This tutorial assumes that you have InfluxDB and Telegraf installed and configured on your local machine to gather CPU stats.

SAI Something Linux: Monitoring Linux with Splunk App for Infrastructure

Metrics and logs go together like cookies and milk. Metrics tell you when you have a problem, and logs/events often tell you why that problem happened. But it’s always been harder than it needed to be to get both types of data onto a single screen, especially when the sysadmins using the tools aren’t necessarily daily experts in managing those monitoring platforms.

Diagnosing out-of-memory errors on Linux

Out-of-memory (OOM) errors take place when the Linux kernel can’t provide enough memory to run all of its user-space processes, causing at least one process to exit without warning. Without a comprehensive monitoring solution, OOM errors can be tricky to diagnose. In this post, you will learn how to use Datadog to diagnose OOM errors on Linux systems.

How to monitor Golden signals in Kubernetes

What are Golden signals metrics? How do you monitor golden signals in Kubernetes applications? Golden signals can help to detect issues of a microservices application. These signals are a reduced set of metrics that offer a wide view of a service from a user or consumer perspective, so you can detect potential problems that might be directly affecting the behaviour of the application.

Embed Your Status Page Everywhere

A well-crafted status page is designed to save you time, energy, and resources when communicating service irregularities. Instead of fielding thousands of support requests when you experience an outage, a status page provides a self-service way for your customers to get up-to-the minute information about any current downtime. It also allows you to proactively communicate maintenance and other work in advance.

Where did all my spans go? A guide to diagnosing dropped spans in Jaeger distributed tracing

Nothing is more frustrating than feeling like you’ve finally found the perfect trace only to see that you’re missing critical spans. In fact, a common question for new users and operators of Jaeger, the popular distributed tracing system, is: “Where did all my spans go?” In this post we’ll discuss how to diagnose and correct lost spans in each element of the Jaeger span ingestion pipeline.

Monitoring Your Dynamic Cloud Infrastructure

Fully taking advantage of cloud infrastructure includes the ability to scale up and down dynamically, taking the need and load off your services. The compute services like Amazon Web Services (AWS) EC2, Azure Virtual Machines (VM), and Google Cloud Platform (GCP) Compute Engine allow Auto Scaling of the instances of the service. This helps manage the responsiveness and costs of your cloud services by ensuring that the instance counts go up and down depending on demand.