Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Why Network Operations Needs Data-Centric AI

The discussion around AI in infrastructure and operations has become increasingly model-centric. Teams want to know what model a platform uses, how current it is, how much reasoning capacity it has, and how quickly it can be updated as the model landscape shifts. Those are reasonable questions, but they tend to arrive too early. In production operations, the more consequential question is what happens to the data before any model is asked to interpret it.

Building Real-Time Telemetry Pipelines for IRIG 106 compliance

Every second of a flight test produces a torrent of telemetry from engines, sensors, and control systems. Aerospace teams have captured this data for decades to verify performance and maintain safety, yet analysis often happens long after the mission ends. Engineers wait for downloads, conversions, and compliance checks before they can interpret results. That delay turns telemetry into a historical record instead of a feedback loop.

When your agents hallucinate at 2 am, it is not a model problem

The first time an AI assistant suggests "restart the service" during a live incident and nobody on the bridge can tell whether that suggestion came from a current runbook, a stale wiki page, or thin air, you stop caring about model benchmarks. You start caring about what the agent actually knew, where that knowledge came from, and whether you can trust the chain of reasoning behind it.

ITSM Maturity Playbook Live, Episode 1: Incident Management Masterclass

Join this 5-part series designed to help IT teams move from reactive, fragmented processes to a more structured, connected way of working. Each session focuses on a core area, from incident resolution and CMDB visibility to employee experience, service catalog design, and change governance, giving you practical frameworks you can apply right away. You’ll walk away with: Faster, more consistent incident resolution.

How to Identify LAN Issues (Local Area Network Problems)

Here is a reality that every network admin eventually runs into: users report slow apps, dropped calls, and broken connections, and the first instinct is to blame the ISP or the cloud provider. The ticket gets escalated, the ISP pushes back, and hours later, you find out the problem was sitting inside your own building the whole time. A saturated switch port. A misconfigured VLAN. A flaky patch cable in the server room.

Proactive vs Reactive Monitoring: What are the Differences?

A single hour of unplanned downtime can cost a mid-sized enterprise more than $300,000, according to ITIC report. Most of that cost comes from one place: teams find out about the problem after users do. That is the core limitation of reactive monitoring. It tells you something has failed, but doesn't tell you something is about to fail. This guide is for IT operations leads, platform and SRE engineers, and IT directors deciding how to evolve their monitoring practice.

Action trails: The missing link between AI and human trust

When people talk about trusting AI, they usually focus on the interface. It summarizes and uses confident language with a level of clarity that feels reliable. But that’s all window dressing. None of it builds trust. Trust doesn’t come from what the AI says. A verifiable record of what the AI did makes it trustworthy.

How to embed Grafana dashboards into web applications

Note: This post originally published in October 2023 and was updated in May 2026 to include new methods and options for embedding Grafana dashboards. Grafana dashboards are powerful and flexible tools for observing applications and infrastructure, so it’s no surprise we get a lot of questions from the community about how to embed them into their web applications.

Web API: your complete guide for custom integrations

Data is almost always scattered across too many tools. Usually, if you want to see it all in one place, you're stuck building messy pipelines or paying for a warehouse you don't really want. SquaredUp is a window into all those tools. It lets you see what’s happening across your entire stack in real time without moving any of the data. Think of it as a universal translator that lets your tools talk to each other so you can stop the manual digging and just see the big picture.