Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Investigate Issues in Slack: Grafana Cloud Slack App with AI

The Grafana Cloud app for Slack brings observability and incident response closer to where you and your teams already collaborate Ask questions about system health, alerts, on-call schedules, and Grafana Cloud features; manage incidents and alerts; and collaborate with full context.

Happy Birthday to Us: Honeycomb 10 Year Manifesto, Part 1

Christine and I started Honeycomb in 2016, which means it’s been ten years. Christine, a developer, and I, an operations engineer, were both profoundly unhappy with the state of the art in monitoring and logging tools. The tools we had used at Facebook didn’t spray our signals around to a bunch of siloed-off pillars. They consolidated as much context as possible so we could properly explore it, the way every other non-software engineering team already takes for granted.

A Notification List Is Not a Team

In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years of working with engineering teams of every size and shape, we’ve seen this assumption fail repeatedly.

A new perspective on dashboard sprawl

Dashboards are supposed to answer questions, not create more of them. But investigations don't stop at a single view. The moment you want to understand one specific thing in detail like a failing VM, a degraded service, a slow pipeline, dashboards start to break down. You end up either building yet another dashboard or searching through many different ones. SquaredUp's Perspectives changes this.

Why distributed observability is straining and what new research reveals

Distributed systems quietly run much of today's digital world. People expect these systems to work reliably across regions and time zones for everything from money transfers to streaming platforms and AI-driven workloads. As organisations use more microservices, containers, and event-driven architectures, observability has become the main way for teams to understand what is happening in production.

Landscape Operations Automation beyond SAP Landscape manager

During the summer of 2024, SAP quietly announced the end of the Landscape Manager product. You can find out more from SAP directly here, including linked SAP Notes. LaMa Discontinued Community Post Unlike the news for Solution Manager or Focused Run, where the 2027 date signals a transition to extended support options, with LaMa the product is discontinued and extended support options aren’t available. For customers using Lama, the announcement and timeline are disruptive.

Same Work, More Windows: Why AI Isn't Paying Off Yet (w/ Anthony Firmin)

In the first episode of a NEW ERA for the DEX Show, Tom (that's right, just Tom ) welcomes back AI and digital transformation leader Anthony Firmin to unpack the reality of enterprise AI adoption. Drawing on hard-won, real-world experience, Anthony explores why so many organisations are stuck in the “messy middle” of AI, where usage rises but value doesn’t. The conversation digs into trust, experience debt, shallow versus deep AI, and why “same work, more windows” is an early warning sign leaders ignore at their peril. It’s a grounded, human-centred look at what it really takes to make AI improve work, not just change it.