Operations | Monitoring | ITSM | DevOps | Cloud

Full-stack observability in Grafana Cloud: How to investigate issues across services and infrastructure

Many times, the hardest part of troubleshooting isn’t fixing the actual problem. It’s figuring out where to start. As engineers, it’s easy to lose count of how many times we’ve opened logs, then 10 metrics tabs, and another 10 tabs with trace queries, only to end up back in the logs trying to find a root cause.

Overview of AI Evaluation (The Context Window #05)

Can you actually trust an AI agent? In this pre-recorded episode of The Context Window, Nicole van der Hoeven sits down with Yas Ekinci, an engineer on the Grafana AI team, to talk about evals — how Grafana measures the quality and reliability of the AI it ships. They get into the difference between online and offline evals, why reviewing AI-generated code has become the real bottleneck, the "final answer problem" of plausible-but-wrong outputs, and o11y-bench, Grafana's open benchmark for observability agents. Along the way.

How Grafana Cloud Ingests Your Data | Data Sources, Alloy & OTel Explained

Learn the two main ways to get data into Grafana Cloud. In this video, we break down how Grafana Cloud connects to over 150 external data sources (like Salesforce, Postgres, and CloudWatch) where your data stays in place, and how you can send raw telemetry into Grafana’s fully managed databases for logs, metrics, traces, and profiles.

Grafana 13.1 release: observability as code updates, extending Grafana Assistant across more data sources, and more

Earlier this year, Grafana 13 laid the groundwork for making it easier and faster than ever to turn your data into actionable insights. With our latest minor release, Grafana 13.1, we're building on that foundation, expanding observability as code, bringing Grafana Assistant to more data sources, and streamlining the everyday workflows teams rely on to visualize, analyze, and act on their data. Download Grafana 13.1 Below are just some of the highlights from Grafana 13.1.

Observability for a Privacy-first AI Wearable | Grafana Everywhere

Trust is everything when AI gets personal. Golden Grot Award winner and NeoSapien co-founder and CEO Dhananjay Yadav shares how his team uses Grafana Assistant to ensure the privacy-first AI wearable delivers a seamless, reliable experience without compromising its mission. Because when AI moves closer to our everyday lives, teams need to know what’s happening — and users need to trust that it’s working as intended.

Inside the AI Team Weekly: AI Observability workflows and Prometheus exemplars (May 19th, 2026)

The Grafana AI team (Engineers Ivana Huckova and Sonia Aguilar) share what's new in AI Observability this week: a new way to instrument and visualize agent workflows, plus a neat trick for jumping straight from a metric spike to the exact conversation that caused it using Prometheus exemplars. In this episode: We're showing parts of our team meetings to build in public in some small way and give you a sneak preview of what's to come. But not all features we show may make it to production! You've been warned. :)

Grafana Tempo: The distributed tracing journey to 3.0 (June 2026 Community Call)

Our distributed tracing journey from the inception of Tempo to 3.0. Can't comment in the chat? You may need to create a channel. Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, traces, and profiles.

Automatically discover and remediate root causes with Grafana Assistant Investigations

You can use Grafana Assistant Investigations to automatically discover incidents and help find root causes—and this AI-powered Grafana Cloud feature recently got a major upgrade to give you even more confidence in its findings. You can read more about the behind-the-scenes effort in our new engineering blog Unprompted, where we get into harness engineering, context compaction, benchmarking, and keeping agents alive and working well in long-running sessions.

Asimov's Zeroth Law of Robotics: testing and observing AI (ExpoQA 2026)

Asimov's Three Laws of Robotics are missing one — and when it comes to testing and observing AI, Nicole van der Hoeven argues that missing rule changes everything: before a robot can avoid harm, obey orders, or protect itself, there has to be a Zeroth Law: a robot must be observable. Because if you can't see what a system is doing, you have no way of knowing whether it's following any rule at all.

Why Engineers Don't Trust Autonomous AI - 4th Annual Observability Survey | Grafana Labs

The 2026 Observability Survey from Grafana Labs heard from over 1,300 engineers and leaders across 76 countries on the real-world role of AI in observability. The data reveals a sharp distinction between intelligence and autonomy — and a critical blind spot most teams have.

AI Observability Deep Dive Demo | Grafana Cloud

Grafana AI Observability is our new database and platform for observing AI Agents. Over the past year at Grafana Labs, we built Agents and we needed a way to understand how they are performing, what are the costs associated with them, what's the error rate or time to the first token as well as how they are behaving. Grafana Staff Engineer, Ivana Hučková provides a deep dive demo on how Grafana AI Observability connects our experience building Agents with our experience building observability systems.

Grafana Assistant Context Offloading

Context Offloading is a pipeline solution for managing Observability with AI Agents. If you are building AI Agents that work with real data, the context window can very easily get filled with bloated context that the Agent does not really need. Sven demonstrates "Context Offloading", a solution that stores the JSON result and sends only the summary of the JSON blob, making the LLM loop performance much quicker and keeping your context window small.

Observability for Healthcare Systems | Grafana Everywhere

Grafana Assistant is going places you might not expect — including healthcare. Golden Grot winner Oren Lion from TeleTracking reveals how Grafana Cloud supports their systems that help keep patient care moving — and how Assistant enables teams to get from “what happened?” to “here’s why” faster. From moon landings to patient care, Grafana is everywhere. Congratulations to Oren, Chris Johnson, Mark Munson, and the entire TeleTracking team on winning this year's Golden Grot Award for Pioneering AI in Observability!

How to generate real-world load tests using Grafana Cloud k6 and production telemetry

For many development teams, a load test starts with a set of assumptions. You pick 100 virtual users because it sounds reasonable. You ramp for 30 seconds because that's what the tutorial showed. You set a 500ms threshold because it feels like a good target. The test passes, you ship the release, and production falls over at 6 p.m. on a Tuesday because your synthetic load never resembled how real users interact with your application.

What's New in Tempo 3.0

Tempo 3.0 introduces a major architectural shift that decouples the read and write paths, with Kafka handling durability on the write side and a new live store serving recent traces on the read side. Blocks are now written at a replication factor of one instead of three, significantly reducing storage overhead. This release also brings TraceQL metrics to general availability, adds comparison operators for filtering metric results at query time, and introduces a new Tempo CLI redact command for removing sensitive trace data on demand without waiting for retention to expire.

Tempo 3.0 release: a new architecture for scale and lower TCO, TraceQL metrics GA, and more

Tempo started with a simple goal: make distributed tracing easier to run at scale. As tracing adoption has grown, however, so have the challenges, including higher data volumes, more complex architectures, and increasing demand for real-time insights directly from traces. Over the last year, we’ve been evolving Tempo’s architecture to meet that moment. And today, we’re sharing the results of those efforts with the release of Tempo 3.0.

Inside the Grafana AI Team Weekly: AI Observability for the OTel demo and LLMSpec (May 12, 2026)

This is an excerpt from a real AI team weekly meeting where we talk about the stuff we build and occasionally also demo them! In this one, Principal Software Engineer Sven Großmann demos how he integrated AI Observability into the OTel demo, complete with the guards feature he introduced last week, and Principal Software Engineer Yas Ekinci gives a rare glimpse of LLMSpec, the internal counterpart of the o11ybench benchmark that we use to evaluate Assistant.