%term

The latest News and Information on Observabilty for complex systems and related technologies.

What Customers Are Doing With AI and Honeycomb

Jun 30, 2026 By Rox Williams In Honeycomb

At O11yCon, we talked to engineering teams across the industry, and the numbers are starting to get genuinely wild: Mixpanel DevOps Engineer Eddie Bracho told us their engineering team is generating 50% more PRs than before AI came into the mix (sorry). That kind of velocity is exciting, but it's also a pressure test for every part of your stack that isn't writing code, including your observability practice. Here's what we're hearing from customers about how that's playing out.

Read Post

Honeycomb

Read more about What Customers Are Doing With AI and Honeycomb

Debug and evaluate your AI app from your coding agent with Datadog Agent Observability

Jun 30, 2026 By Michael Bevilacqua-Linn In Datadog

Coding agents like Claude Code, Cursor, and Codex CLI handle the coding parts of building an AI application well. The harder work comes after: understanding why a response went wrong, building eval sets that reflect real production behavior, and keeping up with an application that changes faster than any one-off script can. Teams spend 60–80% of their time on evaluation and error analysis, and much of that work needs to be redone every time the stack shifts.

Read Post

Datadog

Read more about Debug and evaluate your AI app from your coding agent with Datadog Agent Observability

New Feature: Automatic Snapshots When Latency Spikes

Jun 30, 2026 By Roi Bar In Lightrun

We’ve released an exciting new Lightrun capability: set a duration threshold on your Tic & Toc or Method Duration metrics, and Lightrun will automatically capture a snapshot whenever execution exceeds it. It takes moments to configure, and gives engineers the runtime context they need to understand why unexpected slow executions are occurring.

Read Post

Lightrun

Read more about New Feature: Automatic Snapshots When Latency Spikes

What Is Agentic Observability? The Complete Guide for Enterprise Engineering Teams

Jun 29, 2026 By Libi Michelson In logz.io

TL;DR Agentic observability uses AI agents to autonomously investigate incidents, identify root causes, and take action in production environments. Unlike traditional monitoring (which alerts and waits) or AIOps (which assists human analysis), agentic platforms conduct the investigation themselves. Key capabilities include autonomous incident triage, evidence-backed root cause analysis, alert noise reduction, and governed remediation.

Read Post

logz.io

Read more about What Is Agentic Observability? The Complete Guide for Enterprise Engineering Teams

Instrumenting AI Agents for the Agent Timeline: A Practical OpenTelemetry Guide

Jun 29, 2026 By Dan Juengst In Honeycomb

AI agents are nondeterministic, multi-step, and opaque. When one fails in production, "the model said something weird" is the cheapest, most useless line in your incident postmortem. To debug agents the way they actually run, you need telemetry that captures all of it, in order, with enough context to reconstruct what happened. The OpenTelemetry GenAI Semantic Conventions give you a vendor-neutral way to do exactly that.

Read Post

Honeycomb

Read more about Instrumenting AI Agents for the Agent Timeline: A Practical OpenTelemetry Guide

Why Observability Isn't Enough for AI Coding Agents

Jun 29, 2026 By Lightrun Team In Lightrun

Observability platforms collect pre-instrumented logs, metrics, and distributed traces to monitor production systems and surface failures to human engineers. The adoption of AI into engineering has led observability providers to offer those same signals to agents. This is often packaged as AI observability, but the signals themselves were designed around a human investigation loop. AI coding agents work faster, consume data differently, and need feedback as they work rather than after deployment.

Read Post

Lightrun

Read more about Why Observability Isn't Enough for AI Coding Agents

Fleet Observability: Linux Edge Device Monitoring

Jun 28, 2026 By Netdata Team In netdata

It feels less like managing devices and more like remote babysitting. You check the dashboard, everything is green, and then a customer in the field tells you a device has been down for two days. At a handful of servers, the rare failure is an event.

Read Post

netdata

Read more about Fleet Observability: Linux Edge Device Monitoring

From query to action: Introducing SQL alerting in Cloud Monitoring Observability Analytics

Jun 27, 2026 By Joy Wang In Google Operations

Cloud Monitoring Observability Analytics lets you create alerts from (and get alerted about) analytical SQL queries of logs and traces.

Read Post

Google Operations

Read more about From query to action: Introducing SQL alerting in Cloud Monitoring Observability Analytics

Runtime Aware PR Review: Validate Changes in Live Production

Jun 26, 2026 By Lightrun Team In Lightrun

Runtime PR review means validating a code change against live variable state, real execution paths, and downstream service behavior before the merge decision. Not after a checkout regression exposes what the diff missed. As AI coding agents ship PRs faster than any reviewer can mentally simulate execution, static analysis and CI leave a structural gap that only runtime evidence can close. This article explains what that gap looks like, why it recurs, and how to close it with runtime context code review.

Read Post

Lightrun

Read more about Runtime Aware PR Review: Validate Changes in Live Production

Grafana + Uptrace: Reuse Your Dashboards in Seconds

Jun 26, 2026 By Uptrace In Uptrace

In this tutorial you'll learn how to use Uptrace and Grafana together. Uptrace exposes a Prometheus-compatible HTTP endpoint, so you can add it as a data source in Grafana and reuse your existing dashboards without changing metric names or rewriting queries.

View Video