Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Sponsored Post

Avantra 25.2: Enhancing Security and Reducing Complexity in Hybrid SAP Landscapes

I am pleased to announce the release of Avantra 25.2! While 25.2 is a service release focused on software stability, it introduces several powerful new features designed to streamline SAP automation and improve operational resilience. Let's break down the key deliverables and benefits for Avantra users in this release.

Blameless Postmortem: Foundation of Site Reliability

When systems fail, the instinct to find someone to blame runs deep. But what if assigning fault actually makes your systems less reliable? A blameless postmortem culture transforms how teams learn from incidents, creating stronger systems and more effective incident response processes.

Grafana community dashboards: Memorable use cases of 2025

Every year, Grafana dashboards surface in new corners of the world. And this year, they even reached beyond this world—helping one team land on the moon and another monitor the planet’s health with orbiting satellites. Meanwhile, back here on Earth, the community used Grafana to track everything from wind turbines and wastewater to March Madness and Taylor Swift’s worldwide tour. Here’s a look back at some of the most memorable Grafana community dashboards of 2025.

Drive business outcomes with Unit Economics in Datadog Cloud Cost Management

See how Datadog turns cloud usage and performance data into actionable business insights by helping teams calculate unit economics to measure and optimize the efficiency of every service. You’ll discover how to: Datadog bridges the gap between cloud costs and business value—helping organizations get the most value out of their cloud investment.

Part 3: What If IT Stopped Reacting to Incidents and Started Predicting Them?

Enterprises are experiencing a turning point. Systems scale faster than teams can, AI is rewriting the rhythms of operations, and the cost of downtime grows heavier every quarter. In this new landscape, reacting is no longer enough. Teams need foresight. They need to get ahead of the issue. They need a different model entirely. This third installment centers on a simple but transformative idea. What if IT operations could finally step out of reaction mode and move into anticipation?

Detect, diagnose, and resolve network issues easily with CNM Network Health

In many organizations, developers, SREs, network engineers, and security teams work in specialized domains, which can make it hard to establish a shared view of network health. As a result, engineers often struggle to determine when a network problem that originates outside of their domain of expertise is the root cause of an incident. This lack of visibility slows investigations and delays remediation.

Driving AI ROI: How Datadog connects cost, performance, and infrastructure so you can scale responsibly

AI innovation has accelerated faster than most organizations’ ability to monitor and manage it. The shift from experimentation to production-scale workloads has driven a new class of operational challenges: rising GPU costs, opaque model performance, and the difficulty of linking spend to business value. As AI investments grow, executives need a unified way to measure efficiency and return without slowing down innovation.