Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

The Architecture Shift Powering Network Observability

If you work in network operations, you know that the only constant is the increasing complexity of the infrastructure you manage. The days of installing a monolithic software package on a single bare-metal server and letting it hum along for years are largely behind you. The software industry has largely shifted toward cloud-native architectures, microservices, and containerization. While these shifts promise agility and scalability, they also introduce significant operational complexity.

Kubernetes Network Observability: Comparing Calico, Cilium, Retina, and Netobserv

Calico, Cilium, Retina, and Netobserv: Which Observability Tool is Right for Your Kubernetes Cluster? Network observability is a tale as old as the OSI model itself and anyone who has managed a network or even a Kubernetes cluster knows the feeling: a service suddenly can’t reach its dependency, a pod is mysteriously offline, and the Slack alerts start rolling in. Investigating network connectivity issues in these complex, distributed environments can be incredibly time consuming.

Why distributed observability is straining and what new research reveals

Distributed systems quietly run much of today's digital world. People expect these systems to work reliably across regions and time zones for everything from money transfers to streaming platforms and AI-driven workloads. As organisations use more microservices, containers, and event-driven architectures, observability has become the main way for teams to understand what is happening in production.

Heartbeat behind the metrics | Muraleedharan on support, scale, and seeing the product in the wild

What does observability look like when you’re responsible for customers at scale? In this episode of Heartbeat Behind the Metrics, Muraleedharan Sadhasivam, Head of Customer Success, talks about his 15-year journey at ManageEngine and the perspective you only get from being close to customers every day. He shares why custom dashboards matter so much, and why AppLogs is a feature he wishes more users explored to complete the MELT story. From querying logs to turning them into alerts and dashboards, he explains how real insights start when data is brought together.

How we built Grafana Assistant - a conversation about AI development for observability

This conversation with Grafana Labs engineers, Mat Ryer, Cyril Tovena and Sven Großmann, dives deep into the engineering behind Grafana Assistant, exploring how agentic AI is transforming the observability landscape. From hackathon origins to sophisticated backend agents, the team shares candid lessons on building, scaling, and refining AI tools for engineers.

Kiro Can Now Reason With Lightrun's Live Runtime Context

AI code generation is fast. Making it reliable requires runtime context. Today, Kiro gains live runtime visibility with the Lightrun MCP. This grounds AI-assisted development in how code actually behaves at runtime. Kiro, the AI coding assistant from the teams at AWS, is built for velocity and intuition. It moves from specification to production with speed and structure, helping teams turn intent into working code. But until now, like every AI coding assistant, Kiro had a major blind spot.

How Honeycomb Supercharges OpenTelemetry for AI

It has become common knowledge that the nature of software development has changed as AI-code generation and agent-based features gain adoption. In perhaps a more subtle shift, the fundamentals of software instrumentation are changing too. As OpenTelemetry becomes the standard instrumentation layer across enterprises, with thousands of developers (many from Honeycomb) actively contributing to it, the nature of the telemetry data captured itself is evolving to meet the growing demand for rich context.