Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Spark: An IT Agent for Every Employee

It’s no secret that all software and more broadly, any technology that doesn’t move atoms is ripe for disruption by the current and future capabilities of large language models. Any workflow, application, or digital process that can be expressed in code can be redesigned, improved, and transformed at speed and scale. AI-first companies will outpace legacy players by orders of magnitude, and many workflow-based models with humans in the loop will be fundamentally reshaped.

Organize your monitors with groups

This is one of our most requested features – and it’s finally here. Many of you told us that as your monitoring setup grows, it becomes harder to manage long lists of services and harder for users to quickly understand what’s actually affected during an incident. Monitor groups were built to solve exactly that. Now you can organize related monitors together and present a clearer, more structured view of system health everywhere StatusGator is used.

Top tips: Designing systems people won't work around

Top tips is a weekly column where we highlight what’s trending in the tech world today and list ways to explore these trends. This week, we’re looking at why people bypass systems—and how better design choices can prevent it. When people work around systems, it’s tempting to blame their behavior. In reality, most employee workarounds are signals.

VirtualMetric's Hybrid Security Data Collection Architecture: Performance and Scale Without Compromise

Modern security operations face a growing architectural challenge: collect telemetry from everywhere, process it in real time, and route it to multiple platforms while maintaining data sovereignty, avoiding agent sprawl, and keeping costs under control. Single-model collection strategies force security teams to make compromises. Agent-only models create operational overhead and maintenance risk. Agentless-only approaches simplify operations but limit depth and flexibility.

Observability with AI? Honeycomb with AI!

Since Honeycomb started, it has had a weakness: too many choices. Every field, custom or standard, hundreds of them, all are free to group, filter, and visualize in dozens of ways. Which ones are interesting? Honeycomb exists to help people understand custom software. It doesn’t pretend to know what matters in your application. That’s an interpretive task, not programmatic. Hey, computers can do interpretation now!

Lightrun Runtime Context MCP | Lightrun

In this video, Lightrun's Moshe Sambol walks you through the power of Lightrun MCP and Runtime Context. A game-changer for AI-assisted development. This integration lets developers debug live issues, inspect real-world variables, and verify fixes across environments, all without leaving the IDE. With Lightrun MCP, you can: Capture live transaction state directly from Staging and Production. Identify root causes using real runtime values, not just static code. Verify fixes instantly without redeploying or context switching.

High Cardinality Metrics: How Prometheus and ClickHouse Handle Scale

TL;DR: Prometheus pays cardinality costs at write time (memory, index). ClickHouse pays at query time (aggregation memory). Neither is "better":they fail differently. Design your pipeline knowing which failure mode you're accepting. -- Every month, someone posts "just use ClickHouse for metrics" or "Prometheus can't handle scale." Both statements contain a kernel of truth wrapped in dangerous oversimplification.