Operations | Monitoring | ITSM | DevOps | Cloud

OTel Updates: OpenTelemetry Proposes Changes to Stability, Releases, and Semantic Conventions

Over the past year, the Governance Committee ran user interviews and surveys with organizations deploying OpenTelemetry at scale. A few patterns came up consistently: Stability levels aren't always obvious. When you install an OTel distribution, some components might be experimental or alpha without clear markers. This makes it harder to evaluate what's production-ready. Instrumentation libraries sometimes wait on semantic conventions.

How to Track Down the Real Cause of Sudden Latency Spikes

Start with distributed tracing to find which service is slow, then use continuous profiling to see why the code is slow, and finally apply high-cardinality analysis to identify which users or conditions trigger the problem. It's 2 AM. Your phone buzzes. Users are reporting timeouts. The metrics dashboard shows p99 latency spiking from 200ms to 4 seconds, but everything looks normal—CPU at 60%, memory stable, no error spikes. A quick pod restart helps briefly, then latency climbs right back up.

Which Observability Tool Helps with Visibility Without Overspend

If you’re trying to control observability spend without cutting visibility, the platforms that usually offer the best cost balance at enterprise scale are Last9, Grafana Cloud, Elastic, and Chronosphere — depending on the shape of your telemetry and the level of operational ownership you want.

OTel Updates: Unroll Processor Now in Collector Contrib

Some log sources bundle multiple events into a single record before shipping them. This is common with VPC flow logs, CloudWatch exports, and certain Windows endpoint collectors. While this batching approach is efficient for transport, it creates challenges when you need to filter, search, or correlate individual events. When a log record contains an array of 47 events, your analytics tool sees one entry instead of 47 distinct records.

9 Monitoring Tools That Deliver AI-Native Anomaly Detection

The observability market has moved beyond manual threshold-setting. Modern platforms use statistical algorithms, machine learning, and causal AI to detect anomalies automatically. Some work immediately after deployment. Others train on your data for better accuracy. Each approach has technical trade-offs worth understanding. This guide compares how nine monitoring solutions handle automated anomaly detection and root cause analysis.

Instrument Jenkins With OpenTelemetry

You can instrument Jenkins with OpenTelemetry using the official plugin and an OpenTelemetry Collector, then send the data to a backend like Last9 to understand where pipeline latency and failures actually originate. Jenkins provides job status and console logs, but it doesn't show how time is distributed across stages, agents, plugins, and external systems. OpenTelemetry fills that gap by emitting traces, metrics, and logs in a standard format that any OTLP-compatible backend can process.

7 Observability Solutions for Full-Fidelity Telemetry

You don’t have to choose between capturing every signal and keeping costs predictable. Modern observability stacks blend full-fidelity storage (time series or columnar systems like ClickHouse and Apache Druid), tail-based sampling for heavy traffic, and tiered storage (hot/warm/cold with S3-backed archives). This gives you full-fidelity incident forensics with the day-to-day cost profile of a sampled setup.

Top 7 Observability Platforms That Auto-Discover Services

You can use an observability platform that automatically discovers your services and provides ready-to-use dashboards with minimal setup. If you're running a system where microservices come and go, containers shift around, or serverless functions scale up quickly, this kind of experience saves you a lot of time. You gain visibility as soon as something goes live, without requiring any additional steps on your part. In this blog, we talk about the top seven platforms that offer these capabilities.

How to Reduce Log Data Costs Without Losing Important Signals

You can cut your log costs by removing repetitive, low-value logs early and keeping only the parts that genuinely help you understand issues. Modern systems generate logs far faster than you expect. Even when your workload stays stable, infrastructure components, retries, and background workers continue producing a steady stream of repeated entries.

OTel Updates: Complex Attributes Now Supported Across All Signals

OpenTelemetry now supports maps, heterogeneous arrays, and byte arrays across all signals. Here’s where these new types shine — and where simple primitives still fit naturally. If you’ve been working with OpenTelemetry for a while, you’re likely familiar with the straightforward key-value approach to attributes. It’s simple, fast, and works well with how most telemetry backends store, index, and query data.