Operations | Monitoring | ITSM | DevOps | Cloud

Why Threshold Monitoring Fails in Distributed Systems

For years, infrastructure stability could be approximated through static limits. If CPU utilization exceeded a defined percentage or response time crossed a fixed boundary, risk was assumed to increase in a predictable way. Monitoring systems were designed around that assumption, and for contained environments, it largely held true.

Grafana 13 release: get value from your data faster, manage operations at scale, and more!

Who says 13 is unlucky? With the release of Grafana 13, we're giving the community the most streamlined, flexible, and intuitive Grafana experience yet. Unveiled during the opening keynote of GrafanaCON 2026, the latest major release is all about helping you get value from your data faster, whether you’re spinning up dashboards, operating Grafana at scale, or extending the platform as your requirements change. Download Grafana 13.

Introducing o11y-bench: an open benchmark for AI agents running observability workflows

Evaluating agents is hard. Verifying observability tasks is harder. Yes, AI agents have gotten dramatically and quantifiably better at coding and tool use, but observability presents a different kind of challenge. In a real incident, the hard part is rarely just writing a query. It's deciding which signal matters, figuring out whether a spike is noise or symptom, correlating metrics with logs and traces, and sometimes making a change in Grafana without breaking the dashboard another engineer depends on.

AI Observability in Grafana Cloud: A complete solution for monitoring your agentic workloads

The observability industry has developed great tools for using metrics, logs, traces, and profiles to monitor the cloud native applications that have dominated the last decade of software development. But when it comes to understanding what an AI system is actually doing, we’re often left reading raw conversations, guessing at quality, and reacting too late. And that’s a problem.

GrafanaCON 2026 announcements: A guide to all the latest news from Grafana Labs

GrafanaCON 2026 kicked off in Barcelona, which is a fitting city to reveal the latest updates in Grafana 13. In 2013, Grafana Labs Co-founder Torkel Ödegaard made the first commit for what would become Grafana while he was on vacation in the Catalan city. "I was traveling here for the Christmas holiday and I got a cold and spent most of the day in bed coding and working on Grafana," said Torkel during the opening keynote of GrafanaCON, our biggest community event of the year.

No more monkey-patching: Better observability with tracing channels

Almost every production application uses a number of different tools and libraries,whether that’s a library to communicate with a database, a cache, or frameworks like Nest.js or Nitro. To be able to observe what’s going on in production, application developers reach out for Application Performance Monitoring (APM) tools like Sentry. But there’s an inherent problem: the performance data that APM tools need is most often not coming natively from the libraries themselves.

Instrumenting WordPress with OpenTelemetry: PHP Tracing, Browser RUM, and Error Capture in Production

WordPress powers 40% of the web but has no native observability story. Here's how to instrument it end-to-end with OpenTelemetry - PHP, browser RUM, and errors. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

8 Signs Your Service Desk Automation Tool Has Become the Bottleneck

Most service desk automation problems get misdiagnosed. You see the ticket backlog, the manual work, and the slow incident response, and assume the issue is due to process, adoption, or staffing. But at some point, the math stops working. You’ve invested in a service desk automation tool, given it time to mature, built workflows around it, and the results still don’t match what was promised.