Observability in under 5 seconds: Reflecting on a year of grafana/otel-lgtm

With grafana/otel-lgtm, observability is just one Docker command away. Over the past year, grafana/otel-lgtm has simplified observability setups, helping developers get a complete OpenTelemetry stack running in under five seconds. With integrations for metrics, logs, traces, and now profiles via Grafana Pyroscope, it has become a go-to solution for demos, development, and testing, as evidenced by its growing community (1k stars on GitHub and growing!) and notable adopters.

From chaos to clarity with Grafana dashboards: How video game company EA monitors 200+ metrics

To be a successful gamer, you have to think strategically and creatively. Working as a software engineer at Electronic Arts (EA), a top video game company, requires the same skills. That’s especially true when it comes to monitoring the EA app, which is the launcher for EA games and used by hundreds of millions of people around the world.

InfluxDB 3 Core: a complete rewrite designed for speed and simplicity

InfluxDB has been a popular time series database for the better part of a decade, and the latest release represents years of work behind the scenes to address several major feature requests users have been asking for since the earliest days of the time series database.

Faster incident response through distributed tracing: Inside Glovo's use of Traces Drilldown

It’s almost 1 p.m. on a Monday afternoon and you’re hungry. You pull up your meal delivery app and select your favorite restaurant and dish. Then you go to check out and nothing happens. Your frustration mounts as you get hungrier by the minute. But there’s frustration on the other side of that transaction as well—engineers are scrambling to figure out what’s wrong as orders drop and revenue losses rise.

Prometheus Gauges vs Counters: What to Use and When

Choosing the wrong metric type in Prometheus can lead to inaccurate dashboards, false positives in alerting, and missed indicators of system failure. Gauge metrics are intended for tracking values that can go up and down, such as memory usage, queue depth, or the number of active connections. Unlike counters, which only increment (or reset on restart), gauges reflect the current state of a resource at scrape time.

Going beyond AI chat response: How we're building an agentic system to drive Grafana

As we look at the role AI can play in Grafana going forward, we want to move beyond the simple chat responses that dominate the world of LLMs today and into agentic systems—AI that can understand, reason, and act on your behalf. The ultimate goal is to make it easy to get things done in Grafana using natural language—whether you’re a seasoned SRE or a new developer. And in the AI world, we call this moving from chat completion to task completion.

Operational Intelligence - the new horizon of observability

Monitoring your systems isn't enough anymore. Neither is “asking questions about your system”. Operational Intelligence embraces observability to proactively deliver business insights, support decision-making, and accelerate innovation. It seems that as the observability market grows and more and more products come into the space, the meaning of the term observability itself becomes more and more nebulous.

How Dropbox rebuilt its logging stack with Grafana Loki after a data center went dark

Two years ago, a power outage knocked a Dropbox data center offline. It wasn’t just any data center. It was the only one where Dropbox hosted Grafana Loki, meaning engineers couldn’t access their log data. “We had considered a data center outage when we were rolling out Loki, but it had just never risen up in priority enough to get put into multiple data centers,” said Chris Hodges, an infrastructure software engineer at the cloud storage company.