Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

The 12 Cats of Observability

On the surface, business-critical IT infrastructure and cats may not seem like they have a lot in common. But they’re way more alike than you might think. Our feline friends contain multitudes, as any cat parent will tell you. They’re complex and can sometimes drive you up a wall. But once they warm up to you—and you warm up to them—the joys and benefits of having them in your life outweigh just about everything. Sounds a lot like technology, right?

Streamlining Incident Investigation

Honeycomb Customer Success Manager Josh Levin explains how to troubleshoot production incidents using Honeycomb's telemetry data: metrics, traces, and logs. While these data forms have separate interfaces, you can investigate seamlessly within Honeycomb. Josh highlights the key role of the "retriever" service in data ingestion and querying and demonstrates cross-validating tracing data with metrics to spot anomalies in pod deployments and resource usage, presented in a separate dataset. He also uses effective log filtering and searching for keywords like "update status.".

Simplify observability with the Grafana OpenTelemetry Starter and Spring Boot 3

To help simplify instrumenting Spring Boot applications with Grafana Cloud, we are excited to introduce the Grafana OpenTelemetry Starter, a project that connects the latest Micrometer enhancements from Spring Boot 3 with Grafana Cloud using OpenTelemetry. By using these tools, you will have logs, metrics, and traces in a single service — in the same easy way that you can use Prometheus with Spring Boot.

Deploying the OpenTelemetry Collector to Kubernetes with Helm

The OpenTelemetry Collector is a useful application to have in your stack. However, deploying it has always felt a little time consuming: working out how to host the config, building the deployments, etc. The good news is the OpenTelemetry team also produces Helm charts for the Collector, and I’ve started leveraging them. There are a few things to think about when using them though, so I thought I’d go through them here.

Incident Review: What Comes Up Must First Go Down

On July 25th, 2023, we experienced a total Honeycomb outage. It impacted all user-facing components from 1:40 p.m. UTC to 2:48 p.m. UTC, during which no data could be processed or accessed. This outage is the most severe we’ve had since we had paying customers. In this review, we will cover the incident itself, and then we’ll zoom back out for an analysis of multiple contributing elements, our response, and the aftermath.

Calculating Sampling's Impact on SLOs and More

What do mall food courts and Honeycomb have in common? We both love sampling! Not only do we recommend it to many of our customers, we do it ourselves. But once Refinery (our tail-based sampling proxy) is set up, what comes next? Since sampling is inherently lossy, it’s good to be sure the organization’s most important measurements aren’t negatively affected.

Grafana Pyroscope 1.0 release: continuous profiling for a modern open source observability stack

When we launched Pyroscope in 2021, we had one clear goal: Give developers a powerful open source continuous profiling tool for collecting, storing, and analyzing profiling data. Grafana Labs had a similar goal when they released Grafana Phlare, a horizontally scalable, highly available open source profiling solution inspired by databases like Grafana Loki, Grafana Mimir, and Grafana Tempo.

Why Observability Architecture Matters in Modern IT Spaces

Observability architecture and design is becoming more important than ever among all types of IT teams. That’s because core elements in observability architecture are pivotal in ensuring complex software systems’ smooth functioning, reliability and resilience. And observability design can help you achieve operational excellence and deliver exceptional user experiences. In this article, we’ll delve into the vital role of observability design and architecture in IT environments.

How to deploy Hello World Elastic Observability on Google Cloud Run

Elastic Cloud Observability is the premiere tool to provide visibility into your running web apps. Google Cloud Run is the serverless platform of choice to run your web apps that need to scale up massively and scale down to zero. Elastic Observability combined with Google Cloud Run is the perfect solution for developers to deploy web apps that are auto-scaled with fully observable operations, in a way that’s straightforward to implement and manage.