Operations | Monitoring | ITSM | DevOps | Cloud

Lumigo Launches AI Agent Observability

LLM-powered agents are reshaping software, but when they fail, troubleshooting is guesswork. Lumigo’s new AI Agent Observability, now in beta, gives you visibility into the entire lifecycle of your agents, from prompt to response to internal decision logic. Built for modern AI workloads, this feature is designed to help engineers monitor, debug, and optimize agents running on platforms like OpenAI, Anthropic, and open-source models.

Observability for containerized workloads: How to run Grafana Beyla as a sidecar in Amazon ECS

Note: Grafana Beyla has been donated to OpenTelemetry under the new project name OpenTelemetry eBPF Instrumentation. Beyla will continue to exist as Grafana Labs’ distribution of the upstream project. Grafana Beyla is an open source eBPF-based auto-instrumentation tool that helps you easily get started with application observability, allowing you to monitor and visualize traces without modifying the application code.

Honeycomb Users Are Living in the Future, Part 1: Sampling

When we talk to new Honeycomb users, a few things stand out as sounding downright magical. Sometimes we’ll hear, “Wow, is that a new feature?” and we’ll say that no, it’s been like that for years. Clearly we need to get the word out! This is the first installment of a blog series I’ll be writing, covering areas of Honeycomb that elicit reactions of awe and disbelief from new users.

Monitoring & Observability Report Top Findings

Today, BigPanda released our first-ever research report based on data gathered from our agentic IT operations platform. Our Monitoring and Observability Tool Effectiveness for IT Event Management report provides insights and benchmarks on incident detection and noise reduction for 130 enterprise organizations, including the monitoring and observability data sources integrated with BigPanda.

Observability in under 5 seconds: Reflecting on a year of grafana/otel-lgtm

With grafana/otel-lgtm, observability is just one Docker command away. Over the past year, grafana/otel-lgtm has simplified observability setups, helping developers get a complete OpenTelemetry stack running in under five seconds. With integrations for metrics, logs, traces, and now profiles via Grafana Pyroscope, it has become a go-to solution for demos, development, and testing, as evidenced by its growing community (1k stars on GitHub and growing!) and notable adopters.

How to Simplify AI Observability Across Hybrid and Cloud Environments

As companies adopt more artificial intelligence (AI) to stay competitive and simplify operations, they’re hitting a snag they’ve seen plenty of times before: complexity. Those user-friendly chatbots and impressive predictive models aren’t magic—they run on powerful GPUs like NVIDIA’s and rely on cloud services such as Azure OpenAI or Amazon SageMaker.

Observability isn't about the tool. It's about the truth

An enterprise client reports latency. Your dashboards say everything is fine. They blame you. You blame them. Nobody can prove it either way. This is where most monitoring efforts hit a wall. Too often, the conversation gets stuck on dashboards and tools instead of the one thing that really matters: truth. Observability isn’t about collecting metrics or building pretty dashboards.

LangChain Observability: From Zero to Production in 10 Minutes

LangChain apps are powerful, but they’re not easy to monitor. A single request might pass through an LLM, a vector store, external APIs, and a custom chain of tools. And when something slows down or silently fails, debugging is often guesswork. In one instance, a developer ended up with an unexpected $30,000 OpenAI bill, with no visibility into what triggered it. This blog shows how to avoid that using OpenTelemetry and LangSmith. With this setup, you’ll be able to.

MCP Observability with OpenTelemetry

2025 has truly been the year of Agentic AI, with MCP (Model Context Protocol) emerging as one of its flashy and most talked-about innovations. While many products have seamlessly integrated MCP servers into their systems, these servers are increasingly being labelled as black boxes, opaque components that handle critical tasks but offer little visibility into what's happening under the hood. We prompt an agent, a tool gets invoked, and a response is generated. But what really happens in between?