%term

The latest News and Information on Observabilty for complex systems and related technologies.

Scary Things Happen in Production. Context Helps You Find Them.

Mar 25, 2026 By Charity Majors In Honeycomb

Production is a rowdy place of chaos, especially at scale. When you have millions of requests per second flowing through your system, weird things are always happening. Outliers, unusual request patterns, spikes and pulses of traffic from unknown sources, port scanning…it’s all there. To the naked eye, it looks like noise. If you know what you are looking for…patterns emerge. The night sky: every dot is a request. Without intent, it's an undifferentiated field of light.

Read Post

Honeycomb

Read more about Scary Things Happen in Production. Context Helps You Find Them.

Smarter Alerts, Faster Root Cause, & Proactive IT Ops with SolarWinds AI Observability

Mar 25, 2026 By solarwindsinc In SolarWinds

Discover how AI is transforming IT operations with SolarWinds Observability. In this video, we showcase powerful new AI-driven features designed to help you detect issues faster, reduce alert noise, and stay ahead of performance problems across your entire stack. From applications and databases to networks, cloud infrastructure, and end-user experience SolarWinds AI delivers deep insights where it matters most.

View Video

SolarWinds

Read more about Smarter Alerts, Faster Root Cause, & Proactive IT Ops with SolarWinds AI Observability

Top Root Cause Analysis Tools Built for Runtime Context

Mar 24, 2026 By Lightrun Team In Lightrun

Root cause analysis tools are designed to help engineering teams understand why failures happen in production and other remote environments. As modern systems become more distributed and input-dependent, many incidents cannot be reproduced outside live environments. The stakes are significant: high-impact IT outages cost organizations a median of $2 million per hour, with annual downtime costs reaching $76 million per organization.

Read Post

Lightrun

Read more about Top Root Cause Analysis Tools Built for Runtime Context

Cribl Search Demo: Security Investigation

Mar 24, 2026 By Cribl In Cribl

In this demo, Nate Zemanek , Staff Solutions Engineer, shows how Cribl Search runs fast investigations. As an open data platform, Cribl Search lets you pull data from multiple sources and query everything from a single pane of glass. You’ll see how to run fast queries with the new lakehouse engine, search historical data with a federated approach, and bring everything together for full context. Then, use Notebooks to collaborate and share findings across teams to understand what happened—faster.

View Video

Cribl

Read more about Cribl Search Demo: Security Investigation

Next.js observability gaps and how to close them

Mar 24, 2026 By Sergiy Dybskiy In Sentry

This blog is based on a recent live workshop. You can watch the the full livestream on Youtube. Next.js gives you a lot for free; server-side rendering, file-based routing, edge runtimes. What it doesn’t give you is a clear picture of what’s actually happening in production.

Read Post

Sentry

Read more about Next.js observability gaps and how to close them

How a Runtime Aware AI SRE Agent Transforms System Reliability

Mar 24, 2026 By Lightrun Team In Lightrun

A runtime aware AI SRE extends existing AI SRE approaches by moving beyond telemetry correlation into runtime-validated reliability. While the majority of AI SRE tools accelerate incident triage using logs, metrics, and traces, they cannot confirm execution behavior if critical runtime signals were never captured. By generating on-demand evidence inside running services, AI SRES can eliminate slow redeploy cycles, ensuring your distributed systems remain resilient under real-world traffic conditions.

Read Post

Lightrun

Read more about How a Runtime Aware AI SRE Agent Transforms System Reliability

From Observability to Action: How Product Analytics Is Closing the Loop in Modern Operations

Mar 24, 2026 By OpsMatters In OpsMatters

Over the past decade, observability has become a cornerstone of modern operations. Metrics, logs, and traces have given teams unprecedented visibility into how systems behave under real-world conditions. Infrastructure can be monitored in real time, incidents can be detected faster, and performance bottlenecks can be diagnosed with increasing precision. But for all its progress, observability still leaves an important question unanswered.

Read Post

OpsMatters

Read more about From Observability to Action: How Product Analytics Is Closing the Loop in Modern Operations

Leveraging Cognitive Diversity to Tackle System Complexity

Mar 23, 2026 By Nick Travaglini In Honeycomb

Most engineering leaders today understand that diversity matters. They've built teams that reflect a range of backgrounds, functions, and experience levels. They run postmortems, retrospectives, and architecture reviews that bring multiple voices to the table. They believe, not unreasonably, that this variety of perspectives leads to better decisions. But there's a problem hiding inside that assumption that can undermine everything: who people are is a surprisingly poor predictor of how they think.

Read Post

Honeycomb

Read more about Leveraging Cognitive Diversity to Tackle System Complexity

How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Mar 23, 2026 By Chris Watts In Grafana

Chris Watts is Head of Enterprise Engineering at OpenRouter, building infrastructure for AI applications. Previously at Amazon and a startup founder. As large language models become core infrastructure for more and more applications, teams are discovering a familiar challenge in a new context: you can't improve what you can't see.

Read Post

Grafana

Read more about How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Making encrypted Java traffic observable with eBPF

Mar 23, 2026 By Nikolay Sivko In Coroot

Coroot's node agent uses eBPF to capture network traffic at the kernel level. It hooks into syscalls like read and write, reads the first bytes of each payload, and detects the protocol: HTTP, MySQL, PostgreSQL, Redis, Kafka, and others. This works for any language and any framework without touching application code. For encrypted traffic, we attach eBPF uprobes to TLS library functions like SSL_write and SSL_read in OpenSSL, crypto/tls in Go, and rustls in Rust.

Read Post