%term

Introducing o11y-bench: an open benchmark for AI agents running observability workflows

Apr 21, 2026 By Yasir Ekinci In Grafana

Evaluating agents is hard. Verifying observability tasks is harder. Yes, AI agents have gotten dramatically and quantifiably better at coding and tool use, but observability presents a different kind of challenge. In a real incident, the hard part is rarely just writing a query. It's deciding which signal matters, figuring out whether a spike is noise or symptom, correlating metrics with logs and traces, and sometimes making a change in Grafana without breaking the dashboard another engineer depends on.

Read Post

Grafana

Read more about Introducing o11y-bench: an open benchmark for AI agents running observability workflows

AI Observability in Grafana Cloud: A complete solution for monitoring your agentic workloads

Apr 21, 2026 By Maurice Rochau In Grafana

The observability industry has developed great tools for using metrics, logs, traces, and profiles to monitor the cloud native applications that have dominated the last decade of software development. But when it comes to understanding what an AI system is actually doing, we’re often left reading raw conversations, guessing at quality, and reacting too late. And that’s a problem.

Read Post

Grafana

Read more about AI Observability in Grafana Cloud: A complete solution for monitoring your agentic workloads

GrafanaCON 2026 announcements: A guide to all the latest news from Grafana Labs

Apr 21, 2026 By Grafana Labs Team In Grafana

GrafanaCON 2026 kicked off in Barcelona, which is a fitting city to reveal the latest updates in Grafana 13. In 2013, Grafana Labs Co-founder Torkel Ödegaard made the first commit for what would become Grafana while he was on vacation in the Catalan city. "I was traveling here for the Christmas holiday and I got a cold and spent most of the day in bed coding and working on Grafana," said Torkel during the opening keynote of GrafanaCON, our biggest community event of the year.

Read Post

Grafana

Read more about GrafanaCON 2026 announcements: A guide to all the latest news from Grafana Labs

No more monkey-patching: Better observability with tracing channels

Apr 21, 2026 By Sigrid Huemer In Sentry

Almost every production application uses a number of different tools and libraries,whether that’s a library to communicate with a database, a cache, or frameworks like Nest.js or Nitro. To be able to observe what’s going on in production, application developers reach out for Application Performance Monitoring (APM) tools like Sentry. But there’s an inherent problem: the performance data that APM tools need is most often not coming natively from the libraries themselves.

Read Post

Sentry

Read more about No more monkey-patching: Better observability with tracing channels

From GPU Silicon to Business Metrics: The 8 Layers of GPU Observability

Apr 21, 2026 By Shekhar In Last9

GPU observability isn't one thing - it's eight connected layers from silicon to cost. See why correlation across layers is what cuts debugging from 2 hours to 2 minutes, and why most teams instrument only one or two.

Read Post

Last9

Read more about From GPU Silicon to Business Metrics: The 8 Layers of GPU Observability

10,000 GPUs, One TSDB: Cardinality at GPU Scale

Apr 21, 2026 By Shekhar In Last9

1,000 nodes × 8 GPUs × 60 metrics = 1.4M time series - before you add pod names or Slurm job IDs. GPU monitoring is a cardinality problem disguised as a metrics problem. How to design for it before production OOMs your Prometheus.

Read Post

Last9

Read more about 10,000 GPUs, One TSDB: Cardinality at GPU Scale

Instrumenting WordPress with OpenTelemetry: PHP Tracing, Browser RUM, and Error Capture in Production

Apr 21, 2026 By Prathamesh Sonpatki In Last9

WordPress powers 40% of the web but has no native observability story. Here's how to instrument it end-to-end with OpenTelemetry - PHP, browser RUM, and errors. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Read Post

Last9

Read more about Instrumenting WordPress with OpenTelemetry: PHP Tracing, Browser RUM, and Error Capture in Production

8 Signs Your Service Desk Automation Tool Has Become the Bottleneck

Apr 21, 2026 By John Gorham In Resolve

Most service desk automation problems get misdiagnosed. You see the ticket backlog, the manual work, and the slow incident response, and assume the issue is due to process, adoption, or staffing. But at some point, the math stops working. You’ve invested in a service desk automation tool, given it time to mature, built workflows around it, and the results still don’t match what was promised.

Read Post

Resolve

Read more about 8 Signs Your Service Desk Automation Tool Has Become the Bottleneck

RTO and RPO in Disaster Recovery Explained | Resilience Testing | Harness

Apr 21, 2026 By Harness In Harness

Struggling with disaster recovery planning? Learn the simple difference between RTO and RPO, the two most important metrics every developer, DevOps engineer, and SRE must understand. RTO (Recovery Time Objective) tells you exactly how long your systems can stay down before it hurts your business. RPO (Recovery Point Objective) shows how much recent data you can afford to lose in an outage.

View Video

Harness

DevOps

Read more about RTO and RPO in Disaster Recovery Explained | Resilience Testing | Harness

Stop Fighting Your Mouse: How I Traded "Drafting Slavery" for Real Restaurant Design

Apr 21, 2026 By OpsMatters In OpsMatters

I've spent the better part of twelve years in the hospitality design trenches. If there's one truth I've brought back from the front lines, it's this: the soul of a restaurant is decided long before the chef ever steps into the kitchen. It's won or lost in your Restaurant Floor Plan.

Read Post

OpsMatters

Read more about Stop Fighting Your Mouse: How I Traded "Drafting Slavery" for Real Restaurant Design

Operations | Monitoring | ITSM | DevOps | Cloud

Introducing o11y-bench: an open benchmark for AI agents running observability workflows

AI Observability in Grafana Cloud: A complete solution for monitoring your agentic workloads

GrafanaCON 2026 announcements: A guide to all the latest news from Grafana Labs

No more monkey-patching: Better observability with tracing channels

From GPU Silicon to Business Metrics: The 8 Layers of GPU Observability

10,000 GPUs, One TSDB: Cardinality at GPU Scale

Instrumenting WordPress with OpenTelemetry: PHP Tracing, Browser RUM, and Error Capture in Production

8 Signs Your Service Desk Automation Tool Has Become the Bottleneck

RTO and RPO in Disaster Recovery Explained | Resilience Testing | Harness

Stop Fighting Your Mouse: How I Traded "Drafting Slavery" for Real Restaurant Design

Monthly Archive

Follow Us