Operations | Monitoring | ITSM | DevOps | Cloud

ChatOps that actually works: Grafana Cloud, Slack, and AI-powered observability

Context switching isn’t just inefficient—under pressure, it’s exhausting. It slows decision-making, increases the risk of mistakes, and makes even experienced engineers feel like they’re always a step behind the system they’re responsible for. At Grafana Labs, we want to build tools that meet you where you are. That's why we embedded Grafana Assistant, our context-aware AI assistant, directly in Grafana Cloud.

Moving Our Observability Data Collector from Sidecars to eBPF

For years, the Kubernetes sidecar pattern has been a practical way to capture observability data. Running a collector alongside each application pod gave us deep visibility into traffic, including full request and response payloads across supported protocols. However, as cloud-native environments have grown more complex, the limitations of sidecars—such as resource overhead, operational complexity, and scaling challenges—have become more apparent.

Why IT Leaders Are Consolidating Observability Tools in 2026

Consolidation unifies your observability stack, readies it for AI, and paves the path to autonomous IT. Many IT leaders consider consolidation because of cost pressure or rising vendor spend. But the real challenge goes deeper. IT environments have become more complex, distributed, and noisy, making it difficult for fragmented tools to keep up.

Observability with AI? Honeycomb with AI!

Since Honeycomb started, it has had a weakness: too many choices. Every field, custom or standard, hundreds of them, all are free to group, filter, and visualize in dozens of ways. Which ones are interesting? Honeycomb exists to help people understand custom software. It doesn’t pretend to know what matters in your application. That’s an interpretive task, not programmatic. Hey, computers can do interpretation now!

Building reliable dashboard agents with Datadog LLM Observability

This article is part of our series on how Datadog’s engineering teams use LLM Observability to iterate, evaluate, and ship AI-powered agents. In this first story, the Graphing AI team shares how they instrumented their widget- and dashboard-generation agents with LLM Observability to detect regressions and debug failures faster. Visibility into how large language model (LLM) applications behave in real time is essential for building reliable AI-driven systems at Datadog.

What is Runtime Context? A Practical Definition for the AI Era

TLDR: Runtime Context is live, execution-level access to a running production system. It lets engineers and AI agents ask precise questions of running code and get answers immediately, without redeploying or interrupting users. This is the new baseline for reliability.

"You Had One Job": Why Twenty Years of DevOps Has Failed to Do it

Let’s start with a question. What is DevOps all about? I’ll tell you my answer. In retrospect, I think the entire DevOps movement was a mighty, twenty year battle to achieve one thing: a single feedback loop connecting devs with prod. On those grounds, it failed. Not because software engineers weren’t good at their jobs, or didn’t care enough. It failed because the technology wasn’t good enough.

OpAMP Explained: Why OpenTelemetry Needed an Agent Management Protocol (and How We Use It)

OpenTelemetry makes it easy to produce and transmit any type of telemetry. In production environments, this often means deploying the OpenTelemetry Collector as an intermediary to process, enrich, and route telemetry data. As systems scale, so does this infrastructure—sometimes to hundreds or thousands of Collectors spread across environments.

Why Observability Budgets Keep Growing Even When IT Is Asked to Cut Costs

Observability is the surprising budget line that isn’t shrinking. 96% of IT leaders expect observability budgets to hold steady or grow over the next 12 months. And 62% expect those budgets to increase regardless of broader IT budget cuts. Why? Because as infrastructure becomes more distributed and harder to manage, observability has shifted from a “nice to have” to a control point for cost, performance, and risk.