Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Sponsored Post

Avantra 26: A Breath of Fresh Multi-Tenant AIR

There's a crackle and spark in the air at Avantra lately, and I'm so pleased to be writing this bit on what we've accomplished with the Avantra 26 release. Automated root cause analysis, multi-tenant management support for Cloud ALM, enhanced security operations and financial operations monitoring BTP - it's all there, and more. It's an exciting and innovative release for Avantra!

What's New in Scout Monitoring: June 2026

June was about finishing touches. The fun part. Node.js support, which we previewed in May, is live. Anomaly detection graduated with a rebuilt algorithm, per-monitor controls, and access from the API, CLI, and MCP server. We also kept pulling on the same thread from recent months: Scout data should be reachable from wherever you actually work. The MCP server now covers historical insights, anomaly events, and 30-day metrics. Discord is a notification channel. The CLI has scout anomalies.

Why Observability Isn't Enough for AI Coding Agents

Observability platforms collect pre-instrumented logs, metrics, and distributed traces to monitor production systems and surface failures to human engineers. The adoption of AI into engineering has led observability providers to offer those same signals to agents. This is often packaged as AI observability, but the signals themselves were designed around a human investigation loop. AI coding agents work faster, consume data differently, and need feedback as they work rather than after deployment.

What is Network Monitoring? A Guide for IT Teams

Over 90% of mid-sized and large companies estimate that a single hour of downtime now costs more than $300,000. The clock starts the moment something breaks, whether anyone has noticed it or not. And most outages don't start with alarms. They begin with a small issue inside the network: an overloaded switch, a saturated link, or an unstable interface. Left unnoticed, those small issues grow into user complaints, stalled work, lost revenue, and damaged customer trust.

Instrumenting AI Agents for the Agent Timeline: A Practical OpenTelemetry Guide

AI agents are nondeterministic, multi-step, and opaque. When one fails in production, "the model said something weird" is the cheapest, most useless line in your incident postmortem. To debug agents the way they actually run, you need telemetry that captures all of it, in order, with enough context to reconstruct what happened. The OpenTelemetry GenAI Semantic Conventions give you a vendor-neutral way to do exactly that.

Your AI isn't underperforming. Your data foundation is.

New research reveals why Australian businesses are entering the new financial year with bigger AI budgets and the same unsolved problem. One in three Australian businesses exceeded their AI budget last year. Yet, half of them plan to increase AI spending again this year. Yet the behaviour that caused those budget overruns remains largely unaddressed.

Unleashing Enterprise Agility: The Power of Portfolio Kanban Flow States

In the world of enterprise Agile, we face a persistent paradox: How do we empower individual teams to establish their own unique processes, while ensuring leadership maintains a clear, consistent view of the entire organization’s progress? For a long time, the answer was a compromise.

Logz.io Webinar Recap: A Four-Step Blueprint for Faster Root Cause Analysis

Incident investigations take so long not because the fix is hard, but because finding the right fix is. Most engineers spend 20 to 60 minutes just understanding what’s wrong before they can act, not fixing anything, just trying to see the full picture. The framework that changes this has four steps: Orient, Isolate, Hypothesize, and Verify, and the order matters more than the tools.

What Is Agentic Observability? The Complete Guide for Enterprise Engineering Teams

TL;DR Agentic observability uses AI agents to autonomously investigate incidents, identify root causes, and take action in production environments. Unlike traditional monitoring (which alerts and waits) or AIOps (which assists human analysis), agentic platforms conduct the investigation themselves. Key capabilities include autonomous incident triage, evidence-backed root cause analysis, alert noise reduction, and governed remediation.