Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Distributed Tracing and related technologies.

Icinga 2 Meets OpenTelemetry: Native Metrics Export in v2.16

The OTLPMetricsWriter is a new Icinga 2 feature available since v2.16 that exports check plugin performance data as OpenTelemetry-compliant metrics via the OTLP HTTP protocol. With a single configuration object, it connects Icinga 2 to any OTLP-compatible backend like Prometheus, Grafana Mimir, Datadog, Elasticsearch, VictoriaMetrics, and more.

How to Exclude Health Check Endpoints from Python OTel Traces

Health check endpoints generate thousands of identical, useless spans per day. Here are two production-ready approaches to filter them from your Python OTel traces — and the correctness trap most implementations miss. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Fixing Broken Traces in GCP Cloud Run: A Custom OpenTelemetry Propagator

GCP's load balancer silently rewrites your traceparent header, orphaning spans in any OTLP backend. Here's the custom propagator that fixes it. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Why Your PromQL Availability Query Returns Nothing When Services Are Healthy

Your SLI query shows 100% availability as No Data. Here's why PromQL returns empty results instead of zero — and the label-preserving fix. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Managing OpenTelemetry Semantic Convention Migrations With the Collector

Real production data tells the story better than I can. Juraci Paixão Kröhling, a friend and fellow observability practitioner at OllyGarden, recently shared an example from an anonymized production environment: 1,830 occurrences of http.url and 23,984 occurrences of url.full in the same dataset. Both attributes describe the same thing. Both are actively being written to the same backend at the same time.

Fast AI Feedback Loops with Honeycomb and OpenTelemetry

Are you writing agentic applications, but aren’t sure what the agents are doing? Finding out too late that you've blown the budget with super expensive models? Not sure where the agents are failing, and feeling a loss of control? Could they do better? Observability is the visibility you need to get the job done. Sending telemetry to Honeycomb explains what your agents are actually doing.

Uptrace MCP Server: Auto-Generate Dashboards with AI in Minutes

Tired of clicking through menus to build observability dashboards? In this video I walk through how to configure the Uptrace MCP (Model Context Protocol) server and connect it to an AI assistant so your dashboards get created automatically from natural-language prompts. You'll learn how to: By the end you'll have a working setup where describing what you want to monitor is enough to get a real, shareable dashboard in Uptrace.

Route OTel data from AI apps to ClickHouse and Datadog using Observability Pipelines

As organizations continue to heavily invest in AI and build more agentic workflows, their telemetry data volumes can surge quickly, and the associated costs can become unpredictable. To regain control of their data, many AI-forward teams are turning to high-throughput, low-latency pipelines to collect and route data to tools such as OpenTelemetry (OTel) and ClickHouse. But these self-hosted solutions come with drawbacks.

Manage service tracing across hosts with Single Step Instrumentation rules

Single Step Instrumentation (SSI) simplifies Datadog Application Performance Monitoring (APM) by automatically discovering and instrumenting services across a host. For many teams, SSI is the ideal starting point because it helps them achieve full visibility with minimal setup. However, as environments grow, teams often want more control over which services get traced. Auxiliary workloads such as batch jobs and cron tasks might not require distributed tracing.