Operations | Monitoring | ITSM | DevOps | Cloud

Introducing o11y-bench: an open benchmark for AI agents running observability workflows

Evaluating agents is hard. Verifying observability tasks is harder. Yes, AI agents have gotten dramatically and quantifiably better at coding and tool use, but observability presents a different kind of challenge. In a real incident, the hard part is rarely just writing a query. It's deciding which signal matters, figuring out whether a spike is noise or symptom, correlating metrics with logs and traces, and sometimes making a change in Grafana without breaking the dashboard another engineer depends on.

AI Observability in Grafana Cloud: A complete solution for monitoring your agentic workloads

The observability industry has developed great tools for using metrics, logs, traces, and profiles to monitor the cloud native applications that have dominated the last decade of software development. But when it comes to understanding what an AI system is actually doing, we’re often left reading raw conversations, guessing at quality, and reacting too late. And that’s a problem.

GrafanaCON 2026 announcements: A guide to all the latest news from Grafana Labs

GrafanaCON 2026 kicked off in Barcelona, which is a fitting city to reveal the latest updates in Grafana 13. In 2013, Grafana Labs Co-founder Torkel Ödegaard made the first commit for what would become Grafana while he was on vacation in the Catalan city. "I was traveling here for the Christmas holiday and I got a cold and spent most of the day in bed coding and working on Grafana," said Torkel during the opening keynote of GrafanaCON, our biggest community event of the year.

No more monkey-patching: Better observability with tracing channels

Almost every production application uses a number of different tools and libraries,whether that’s a library to communicate with a database, a cache, or frameworks like Nest.js or Nitro. To be able to observe what’s going on in production, application developers reach out for Application Performance Monitoring (APM) tools like Sentry. But there’s an inherent problem: the performance data that APM tools need is most often not coming natively from the libraries themselves.

Instrumenting WordPress with OpenTelemetry: PHP Tracing, Browser RUM, and Error Capture in Production

WordPress powers 40% of the web but has no native observability story. Here's how to instrument it end-to-end with OpenTelemetry - PHP, browser RUM, and errors. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

8 Signs Your Service Desk Automation Tool Has Become the Bottleneck

Most service desk automation problems get misdiagnosed. You see the ticket backlog, the manual work, and the slow incident response, and assume the issue is due to process, adoption, or staffing. But at some point, the math stops working. You’ve invested in a service desk automation tool, given it time to mature, built workflows around it, and the results still don’t match what was promised.

RTO and RPO in Disaster Recovery Explained | Resilience Testing | Harness

Struggling with disaster recovery planning? Learn the simple difference between RTO and RPO, the two most important metrics every developer, DevOps engineer, and SRE must understand. RTO (Recovery Time Objective) tells you exactly how long your systems can stay down before it hurts your business. RPO (Recovery Point Objective) shows how much recent data you can afford to lose in an outage.

Stop Fighting Your Mouse: How I Traded "Drafting Slavery" for Real Restaurant Design

I've spent the better part of twelve years in the hospitality design trenches. If there's one truth I've brought back from the front lines, it's this: the soul of a restaurant is decided long before the chef ever steps into the kitchen. It's won or lost in your Restaurant Floor Plan.