Operations | Monitoring | ITSM | DevOps | Cloud

When to hire an Incident Commander

What comes to mind when you hear the term 'incident commander'? You are not alone if you think about fancy, tri-cornered hats, well-polished shoes, and a uniform weighed down by medals. The roles of incident commander, incident manager, or technical escalation manager have been typical in large organizations but are gaining popularity in smaller companies. For the purposes of this article, we will use the term 'incident commander,' but any of the above titles could work.

How to Implement Global View and High Availability for Prometheus

Ensuring that systems run reliably is a critical function of a site reliability engineer. A big part of that is collecting metrics, creating alerts and graph data. It’s of the utmost importance to gather system metrics, from several locations and services, and correlate them to understand system functionality as well as to support troubleshooting.

Application Discovery with DX Unified Infrastructure Management

Having any form of application discovery can be of great benefit. With these capabilities, you can determine what is deployed within your infrastructure and better understand what monitoring to apply to each device. When you know which applications are running within your environment, you can group devices by their associated applications.

Getting Started with C++ and InfluxDB

While relational database management systems (RDBMS) are efficient with storing tables, columns, and primary keys in a spreadsheet architecture, they become inefficient when there’s a lot of data input received over a long period of time. Databases designed specifically to store time series data are known as time series databases (TSDB). For example, an RDBMS might look like this.

Challenges Faced While Implementing Predictive Maintenance

Implementing predictive maintenance is not an easy task. To execute it professionally, organizations need to go from several predictive maintenance challenges. What are those challenges? What are the advantages of predictive maintenance or how it will be beneficial for your organization? What is predictive maintenance anyway? You will find answers to all these questions in this blog. So, let us begin!

Jitter vs Latency - What are the Differences and Why Those Things Matter

The jitter and latency are the characteristics related to the flow in the application layer. Jitter and latency are the metrics used to assess the network's performance. The major distinction between jitter and latency is that latency is defined as a delay via the network, whereas jitter is defined as a change in the amount of latency. Increases in jitter and latency have a negative impact on network performance, therefore it's critical to monitor them regularly.

Platform Engineering teams are the developer's cloud provider

Organizations rely more than ever on their engineering teams to get in front of their customers. Quickly delivering the latest functionalities to end-users in a reliable way can make or break a company these days. This need raises the pressure on engineering to deliver a scalable platform, rollout application updates faster, and manage applications efficiently once in production.

We think Grafana Labs has built something special - and two prestigious lists agree

We have always thought of our organization as special. Our plans were never to build a traditional business, and we know we have a unique culture. But it is nice when others outside of our company recognize that Grafana Labs is something special, too. This week, we were excited to be included on two very prestigious lists: The Enterprise Tech 30 and America’s Best Startup Employers.

What Does AIOps Mean for SREs? It's Complicated.

If you’re an SRE, you might view AIOps with great excitement. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier. Alternatively, SREs may choose to view AIOps with disdain. They might think of AIOps as just a fancy buzzword that doesn’t live up to its promises, and that can become a distraction from the SRE tools that really matter. Which perspective is right?

Predict the cost of IP ranges with new enhancements to the Resources tab

One of our most requested and popular features, IP ranges for the Docker executor, recently became available to all customers on a Performance or Scale plan. With IP ranges, you can route job traffic through an IP address that is verifiably associated with CircleCI. This enables your team to meet compliance requirements by limiting the connections that communicate with your infrastructure. With any new feature, you want to know how much it’s going to cost your team.