Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

Bridging the Gap Integrating Logs Metrics and Flow for Observability

In this video, we discuss handling both old and new systems in IT environments. From legacy SNMP setups to modern telemetry, most organizations juggle multiple data sources, which can make observability feel overwhelming. We explore how to combine logs, metrics, and flow data into one system that provides actionable insights. You’ll see practical examples of simplifying scattered tools and making sense of complex, disparate information. Understanding how these different types of data work together is key to getting observability right.

Smoother, smarter observability with the updated Site24x7 iOS 26

Enjoy improved control, clarity, and communication using the Site24x7 app on iOS 26. This update blends Apple's dynamic liquid glass design language with fast, secure, on-device AI summaries that help you observe your IT stack instantly and act decisively, from anywhere.

Background Job Observability Beyond the Queue

Background jobs handle the critical work that happens outside the request path: processing payments, sending emails, generating reports, syncing data. They keep applications running smoothly, but the signals they produce look different from API endpoints. Most teams start with queue metrics—how many jobs are waiting and how quickly they complete. These metrics provide the foundation, but job health extends beyond throughput.

LangChain Observability: Monitoring Guide for Production Apps

LangChain applications fail differently than traditional web apps. A single user request can trigger 15+ LLM calls, cost $5 in tokens, and fail silently without throwing errors. One team discovered a $12,000 OpenAI bill caused by a recursive chain with no monitoring. This guide shows how to implement observability for LangChain applications, giving you complete visibility into performance, costs, and errors before they impact your users or budget.

What is Service Catalog Observability and How Does It Work?

A service catalog gives teams a shared view of their systems—what services exist, who owns them, how dependencies are structured, and the SLAs that guide expectations. It’s an important part of development infrastructure because it helps everyone speak the same language about services. Service catalog observability builds on that foundation.

Introducing Cost Meter - Proactive Observability Cost Control with Per-Hour Granularity

The irony isn't lost on us - observability platforms are built to be proactive about system health, yet when it comes to managing observability costs themselves, teams are forced to be reactive. Today, that changes with Cost Meter, now live in our platform. Cost Meter transforms observability spend management from a monthly billing surprise into a proactive, data-driven process with hourly aggregated metrics that give you complete visibility into your telemetry ingestion patterns.

APM vs Observability: Observing beyond APM

In my previous post I made a bold, sweeping statement that APM is not - in the most specific sense - a subset of observability. Still standing by it I stand by that because words matter and - like many "monitoring engineers" (IT folks who make monitoring and observability their specialty) - I, too, bear scars from the flame-wars on Twitter back in the 2020's where we fought internecine battles over the proper definition of (and number of pillars in) “observability”.

Introducing Honeycomb Intelligence Canvas

Canvas is an AI-guided workspace inside Honeycomb that combines an AI assistant with an interactive notebook for visualizing query results and traces. You can ask a natural language question about your data and Canvas will immediately start exploring your traces, through multiple queries and other tools, to find the right next steps. Instead of having to write each query yourself, Canvas automatically proposes relational queries, comparisons, and visualizations that explain why an SLO fired or what changed after a deploy.