Operations | Monitoring | ITSM | DevOps | Cloud

OpenTelemetry Distributed Tracing Implementation Guide

Distributed tracing has become essential for understanding the performance and behavior of modern microservices architectures. As applications become more complex with multiple services communicating across different environments, traditional logging and metrics alone are insufficient for debugging performance issues and understanding request flows.

New in OTel: Auto-Instrument Your Apps with the OTel Injector

As distributed systems scale, maintaining manual instrumentation across services quickly becomes unsustainable. The OTel Injector addresses this by automatically attaching OpenTelemetry instrumentation to applications, no code changes needed. This blog covers how the OTel Injector works, how it integrates with Linux environments, and how to set it up for consistent telemetry across your stack.

How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

When you export OpenTelemetry metrics to Prometheus, resource fields like service.name or deployment.environment don’t show up as metric labels. Prometheus drops them. To use them in queries, you’d have to join with target_info: This makes filtering and grouping more difficult than necessary. Prometheus 3.0 changes that. It supports resource attribute promotion—automatically converting OpenTelemetry resource fields into Prometheus labels.

OTel Weaver: Consistent Observability with Semantic Conventions

Deploying a new service shouldn’t break dashboards. But it happens, usually because metric names or labels aren’t consistent across teams. You end up with traces that don’t link, metrics that don’t align, and queries that take hours to debug, not because the system is complex, but because the telemetry is fragmented. OTel Weaver addresses this by enforcing OpenTelemetry semantic conventions at the source.

Observing Vercel AI SDK with OpenTelemetry + SigNoz

LLM-powered apps are growing fast, and frameworks like the Vercel AI SDK make it easy to build them. But with AI comes complexity. Latency issues, unpredictable outputs, and opaque failures can impact user experience. That’s why monitoring is essential. By using OpenTelemetry for standard instrumentation and SigNoz for observability, you can track performance, detect errors, and gain insights into your AI app’s behavior with minimal setup.

How to Build Resilient Telemetry Pipelines with the OpenTelemetry Collector: High Availability and Gateway Architecture

Let’s bring that back. Today you’ll learn how to configure high availability for the OpenTelemetry Collector so you don’t lose telemetry during node failures, rolling upgrades, or traffic spikes. The guide covers both Docker and Kubernetes samples with hands-on demos of configs. But first, let’s lay some groundwork.

OpenTelemetry NestJS Implementation Guide: Complete Setup for Production [2025]

NestJS applications require comprehensive monitoring to ensure optimal performance and rapid issue resolution. As your application grows—spanning multiple services, databases, and external APIs—understanding what's happening under the hood becomes critical. That's where OpenTelemetry comes in. OpenTelemetry provides vendor-agnostic observability for your NestJS applications through distributed tracing, metrics, and logs.

Zero instrumentation distributed tracing is here: Meet OBI on Open Telemetry

Modern systems generate enormous amounts of telemetry. The hurdle is collecting clean, connected traces without rewriting code or babysitting a fleet of language agents. That’s why Coralogix backed eBPF from the start. eBPF (extended Berkeley Packet Filter) executes sandboxed programs inside the Linux kernel, without modifying kernel source code. This method allows probes to see every request, at runtime with no instrumentation, and with near zero per‑request overhead.

OpenTelemetry at Grafana Labs: the latest on how we're investing in the emerging industry standard

Here at Grafana Labs, open source has always been core to what we do. So it should come as no surprise that we’re going all in on OpenTelemetry—an open source project that’s quickly becoming an industry standard for vendor-neutral telemetry.

Monitor Nginx with OpenTelemetry Tracing

At 3:47 AM, your NGINX logs show a 500 error. Around the same time, your APM flags a spike in API latency. But what's the root cause, and why is it so hard to correlate logs, traces, and metrics? When API response times cross 3 seconds, identifying whether the slowdown is at the NGINX layer, the application, or the database shouldn't require guesswork. That's where OpenTelemetry instrumentation for NGINX becomes essential.

Trace Go Apps Using Runtime Tracing and OpenTelemetry

When your Go service hits 500ms latencies but CPU usage is flat, tracing gives you visibility into what the profiler misses. With 1–2% runtime overhead, Go’s built-in tracing tools help you: This makes it easier to debug performance regressions that don’t leave a clear footprint.

Kubernetes Observability with OpenTelemetry | A Complete Setup Guide

Kubernetes provides a wealth of telemetry data from container metrics and application traces to cluster events and logs. OpenTelemetry offers a vendor-neutral, end-to-end solution for collecting and exporting this telemetry in a standardised format.

Enable Kong Gateway Tracing in 5 Minutes

Kong Gateway is a popular API gateway that sits at the edge of your infrastructure, routing and shaping traffic across microservices. It’s fast, pluggable, and battle-tested, but for many teams, it remains a black box. You might have OpenTelemetry set up across your application stack. Traces flow from your app servers, databases, and third-party APIs. But the moment a request enters through Kong, observability drops off.

Choosing the right OpenTelemetry Collector distribution

The OpenTelemetry (OTel) Collector plays a central role in collecting, processing, and exporting telemetry data. If you’re deploying the Collector in production, chances are you’ve reached for the otelcol-contrib distribution. It’s the easiest, most flexible, and most documented distribution, used in nearly every demo and getting-started guide. But here’s the catch: It’s not actually recommended for production use.

Proactively troubleshoot with synthetic testing and distributed tracing

As your application grows in complexity, identifying the root cause of issues becomes increasingly difficult. Many monitoring strategies make this even harder by siloing frontend and backend data. To effectively troubleshoot problems that spread across your app, you need visibility not just into each part of your stack, but also into how these parts interact.

Jaeger Metrics: Internal Operations and Service Performance Monitoring

You're monitoring a microservices-based system. Alerts trigger when response times exceed 2 seconds. But when you open Jaeger, you're faced with thousands of traces. Identifying which service or operation is responsible becomes time-consuming. Jaeger metrics help reduce this friction by exposing aggregated telemetry. Instead of scanning individual traces, you get service-level and operation-level performance metrics, latency, throughput, and error rates that highlight where the issue lies.

Datadog vs Jaeger - Features, Pricing & Use Cases [Updated for 2025]

Datadog and Jaeger are both leading tools in the observability space, but they represent two fundamentally different philosophies. Datadog is a commercial, all-in-one SaaS platform that unifies metrics, traces, and logs. Jaeger is a popular, open-source project focused specifically on distributed tracing. Choosing between them isn't just a technical decision; it's about balancing the convenience of a fully managed, integrated platform against the power and control of a self-hosted, specialized tool.

What Are Traces? A Developer's Guide to Distributed Tracing

One of the most common challenges in modern software engineering today is understanding how requests flow through applications. As system architectures shift to favor widely distributed, cloud-native designs, keeping track of how an application processes user actions is more difficult than ever. A single user action may trigger events processed in dozens of backend services. Traces are helping software developers today with this challenge.

OpenTelemetry Collector: A Complete Guide [2025]

The OpenTelemetry Collector is a stand-alone service that acts as a powerful, vendor-neutral pipeline for your telemetry data. It can receive, process, and export logs, metrics, and traces, giving you full control over your observability data before it reaches a backend. This guide will provide a comprehensive overview of the OpenTelemetry Collector, its architecture, deployment patterns, and how to configure it for production use.

Improve Consistency Across Signals with OTel Semantic Conventions

It’s 2 AM. Your API is timing out. Logs show a slow query. Metrics flag a spike in DB connections. Traces reveal a 5-second delay on a database call. But then the questions start:- Which database?- Does the query match the delay?- Why doesn’t this align with the connection pool metrics? Each tool uses different labels, db.name, database, sometimes nothing at all. Without a shared schema, connecting the dots is slow and frustrating.

Perform Distributed Tracing for your MCP system with OpenTelemetry

2025 has truly been the year of Agentic AI, with MCP (Model Context Protocol) emerging as one of its flashy and most talked-about innovations. While many products have seamlessly integrated MCP servers into their systems, these servers are increasingly being labelled as black boxes, opaque components that handle critical tasks but offer little visibility into what’s happening under the hood. We prompt an agent, a tool gets invoked, and a response is generated. But what really happens in between? And when something breaks, how do we trace the failure and debug it effectively?

Faster incident response through distributed tracing: Inside Glovo's use of Traces Drilldown

It’s almost 1 p.m. on a Monday afternoon and you’re hungry. You pull up your meal delivery app and select your favorite restaurant and dish. Then you go to check out and nothing happens. Your frustration mounts as you get hungrier by the minute. But there’s frustration on the other side of that transaction as well—engineers are scrambling to figure out what’s wrong as orders drop and revenue losses rise.