Operations | Monitoring | ITSM | DevOps | Cloud

A Closer Look at AlertBot's Alert Group Feature

If we start by sharing that AlertBot’s alert group feature lets you, well, alert certain groups, then you might wonder what earth-shattering revelations we have in store — such as water is wet, fire is hot, and the pain of Game of Throne’s final season will never, ever go away (seriously, whatever happened to Gendry?!). Yes, you’re right: the alert group feature IS about alerting groups of people about a site failure — but as George R.R.

Less is more: How Grafana Mimir queries run faster and more cost efficiently with fewer indexes

Over the past six months, we have been working on optimizing query performance in Grafana Mimir, the open source TSDB for long-term metrics storage. First, we tackled most of the out-of-memory errors in the Mimir store-gateway component by streaming results, as we discussed in a previous blog post. We also wrote about how we eliminated mmap from the store-gateway and as a result, health check timeouts largely disappeared.

Observability and the DORA metrics

The Accelerate State of Devops Report highlights four key metrics (known as the DORA metrics, for DevOps Research & Assessment) that distinguish high-performing software organizations: deployment frequency, lead time for changes, time-to-restore, and change fail rate. Observability can kickstart a virtuous cycle that improves all the DORA metrics.

What is Website Maintenance: Your Ultimate Guide to Keeping Your Site Functional

Website maintenance is not that different from keeping up with the maintenance of real brick-and-mortar stores. Would you shop at a dirty store, filled with broken furniture, and selling outdated products? We didn’t think so. Website maintenance plays the same role: it makes the business inviting, makes you look professional, and engages customers.

Scaling Monitoring Administration with Experience-Driven NetOps: AppNeta and DX NetOps

Today, pretty much every critical business service, every critical employee job function, every critical customer transaction, and so much more are all reliant upon network connectivity. It falls to network operations (NetOps) teams to ensure network connections continue to support these demands. Over time, the scale and the complexity of the networks the organization relies upon have continued to grow, making the job of NetOps teams increasingly challenging.

Lookup Tables and Log Analysis: Extracting Insight from Logs

Extracting insights from log and security data can be a slow and resource-intensive endeavor, which is unfavorable for our data-driven world. Fortunately, lookup tables can help accelerate the interpretation of log data, enabling analysts to swiftly make sense of logs and transform them into actionable intelligence. This article will examine lookup tables and their relationship with log analysis.

Simplifying Data Lake Management with an Observability Pipeline

Data Lakes can be difficult and costly to manage. They require skilled engineers to manage the infrastructure, keep data flowing, eliminate redundancy, and secure the data. We accept the difficulties because our data lakes house valuable information like logs, metrics, traces, etc. To add insult to injury, the data lake can be a black hole, where your data goes in but never comes out. If you are thinking there has to be a better way, we agree!

Telegraf Deployment Strategies with Docker Compose

This article, written by Shan Desai, was originally published on his blog and is reposted here with permission. Shan is a Software engineer currently employed at Emerson Discrete Automation and is an Open-Source Contributor / DIY Tech Enthusiast currently working with Industrial IoT. Telegraf is widely used as a metric aggregation tool thanks to the diverse number of plugins it provides that interface with a multitude of systems without having to write complex software logic.

A Complete Guide to Tracking CDN Logs

The Content Delivery Network (CDN) market is projected to grow from 17.70 billion USD to 81.86 billion USD by 2026, according to a recent study. As more businesses adopt CDNs for their content distribution, CDN log tracking is becoming essential to achieve full-stack observability. That being said, the widespread distribution of the CDN servers can also make it challenging when you want visibility into your visitors’ behavior, optimize performance, and identify distribution issues.

Parsing logs with the OpenTelemetry Collector

This guide is for anyone who is getting started monitoring their application with OpenTelemetry, and is generating unstructured logs. As is well understood at this point, structured logs are ideal for post-hoc incident analysis and broad-range querying of your data. However, it’s not always feasible to implement highly structured logging at the code level.