Operations | Monitoring | ITSM | DevOps | Cloud

How we built agentic incident response

‍ AI already transforms how we detect, respond to, and resolve outages. Traditional workflows often force responders to switch between dashboards, shift through logs, and coordinate across fragmented channels under stress. This reactive, manual approach leads to slower resolution, higher operational costs, and burnout, especially as IT systems grow more complex. ‍ At ilert, we are not just discussing the future of incident management – we are actively building it.

Dynamic Status Pages on Demand

Clients expect transparency - especially when things go wrong. But manually updating a status page during an incident or maintenance window slows you down when speed matters most. Oh Dear’s status pages are more than just a pretty uptime dashboard. They’re fully API-driven and designed to scale with your workflow. Whether you manage five client sites or five hundred, you can create, update and sync status pages as needed. Here’s how to do it.

Robust Time Series Monitoring: Anomaly Detection Using Matrix Profile and Prophet

Monitoring production systems often feels like searching for a moving needle in a constantly shifting haystack. At Sentry, our goal was to empower customers to move beyond traditional threshold and percentage-based alerting. We aimed to help them detect subtle and complex anomalies in their systems in near real-time. This post will detail how our AI/ML team developed a time series anomaly detection system using Matrix Profile and Meta’s Prophet.

A Detailed Look at Calico Cloud Free Tier

As Kubernetes environments grow in scale and complexity, platform teams face increasing pressure to secure workloads without slowing down application delivery. But managing and enforcing network policies in Kubernetes is notoriously difficult—especially when visibility into pod-to-pod communication is limited or nonexistent. Teams are often forced to rely on manual traffic inspection, standalone logs, or trial-and-error policy changes, increasing the risk of misconfiguration and service disruption.

What is a Jitter Buffer and How It Works

If you've ever been on a choppy VoIP call or sat through a video meeting where people sounded like robots from the ‘90s, you’ve likely run into a little thing called jitter. It’s one of those sneaky network issues that doesn’t always get the attention it deserves, until it ruins your real-time traffic. As IT pros and network admins, you're probably used to dealing with packet loss and latency. But jitter? That one's a bit trickier.

Top Kubernetes Monitoring Tools in 2025, And Why Alerting Is Critical for DevOps and SRE Teams

What are the best Kubernetes monitoring tools in 2025? And how can you ensure alerts actually drive action when something goes wrong? Kubernetes monitoring is critical for keeping your containerized applications healthy, but alerting is often overlooked. This blog compares popular tools like Prometheus and Datadog and explains why intelligent alerting solutions like OnPage are essential for effective incident response.

MCP Server Integration & Much More: What's New in VictoriaMetrics Cloud Q2 2025

Q2 2025 has brought another wave of improvements to VictoriaMetrics Cloud! If you tuned in to our latest Quarterly Virtual Meetup, you saw firsthand how we’re making observability even more accessible, powerful, and interactive.

MCP Observability with OpenTelemetry

2025 has truly been the year of Agentic AI, with MCP (Model Context Protocol) emerging as one of its flashy and most talked-about innovations. While many products have seamlessly integrated MCP servers into their systems, these servers are increasingly being labelled as black boxes, opaque components that handle critical tasks but offer little visibility into what's happening under the hood. We prompt an agent, a tool gets invoked, and a response is generated. But what really happens in between?