%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

5 Offbeat on-call rotations that work

Feb 12, 2026 By Sreekar In Spike

Most teams choose standard on-call patterns like weekly or daily rotations. But sometimes a less conventional rotation can solve a specific problem or just fit better with how your team works. This guide walks you through five offbeat on-call rotations. For each, we look at why it might work for you and the challenges involved. This helps you see the full picture before you decide to try them out. Let’s dive in!

Read Post

Spike

Read more about 5 Offbeat on-call rotations that work

Follow-the-sun and other on-call models

Feb 12, 2026 By Sreekar In Spike

Most teams run on-call using rotation-based schedules where responsibility shifts every few days or weeks. But some situations call for different models that change who responds based on time zones, expertise, or the type of incident that triggers. This guide walks you through six on-call models that work outside the standard rotation patterns.

Read Post

Spike

Read more about Follow-the-sun and other on-call models

Turning Data Into Decisions with the xMatters Incident AI Agent

Feb 12, 2026 By Jon Skog In xMatters

When an incident hits, the gap between awareness and action can make all the difference. Responders know the pain: endless tool-switching, chasing updates, and fragmented data. It’s not a lack of capability that slows response; it’s the lack of context and connection. That’s why we built the xMatters Incident AI Agent, a purpose-built, conversational assistant that brings intelligence and automation directly into the heart of incident response.

Read Post

xMatters

Read more about Turning Data Into Decisions with the xMatters Incident AI Agent

AWS CloudFront Outage (Feb 2026): Timeline, Cascade, and Lessons

Feb 11, 2026 By Nuno Tomas In isDown

At approximately 9:15 PM UTC on February 10, 2026, Amazon CloudFront began returning NXDOMAIN responses for DNS queries against specific distributions. In practical terms: DNS was telling users that services behind those distributions simply didn't exist. The root cause was a DNS resolution failure within CloudFront's infrastructure that quickly spread to eight interconnected AWS services.

Read Post

isDown

Read more about AWS CloudFront Outage (Feb 2026): Timeline, Cascade, and Lessons

ilert now supports a native WhaTap integration

Feb 11, 2026 By Sirine Karray In iLert

ilert now supports a native WhaTap integration, connecting AI-native observability with AI-first incident management in a seamless workflow. This integration allows DevOps, SRE, and IT teams to move instantly from detection to resolution – cutting through alert noise, improving coordination, and dramatically reducing MTTR in even the most complex IT environments.

Read Post

iLert

Read more about ilert now supports a native WhaTap integration

How to Create and Manage Incidents in Uptime.com

Feb 11, 2026 By Uptime Website Monitoring In uptime

Learn how to create and manage incidents on your Uptime.com Status Page to keep your subscribers informed about service disruptions and maintenance events in real-time. In this tutorial, we'll cover understanding incident statuses (Investigating, Identified, Monitoring, Resolved, and more), three ways to create a new incident, configuring incident details and timelines, adding updates with Markdown formatting, managing and editing incidents, notifying Status Page subscribers, and using the REST API for incident management.

View Video

uptime

Read more about How to Create and Manage Incidents in Uptime.com

SIGNL4 February Release - SCIM, Caller ID, Team Admin Invites

Feb 9, 2026 By SIGNL4 In SIGNL4

We’re excited to share SIGNL4’s first product update of 2026! Automate user onboarding and offboarding with SCIM, control whether Team Admins can invite new users, and choose the caller ID used for call routing.

Read Post

SIGNL4

Read more about SIGNL4 February Release - SCIM, Caller ID, Team Admin Invites

Reference architecture: The blueprint for safe and scalable autonomy in SRE and DevOps

Feb 9, 2026 By Leah Wessels In iLert

Everyone wants autonomous incident response. Most teams are building it wrong. ‍ The ultimate goal of autonomy in SRE and DevOps is the capacity of a system to not only detect incidents but to resolve them independently through intelligent self-regulation. However, true autonomy isn't born from automating random, isolated tasks. It requires a stable foundation: a Reference Architecture.

Read Post

iLert

Read more about Reference architecture: The blueprint for safe and scalable autonomy in SRE and DevOps

Silent Failure in Production ML: Why the Most Dangerous Model Bugs don't Throw Errors

Feb 9, 2026 By Ritika Bramhe In OnPage

You’ve done it. Your machine learning model is live in production. It’s serving predictions, powering features, and quietly doing its job. Dashboards are green. There are no errors in the logs. Nothing appears broken. And yet, something is wrong. Predictions are getting less reliable. Users are waiting a little longer for responses. Conversion rates are slipping. Trust is eroding, but no alert fires, no system crashes, and no one knows there’s a problem until the damage has been done.

Read Post

OnPage

Read more about Silent Failure in Production ML: Why the Most Dangerous Model Bugs don't Throw Errors

PagerDuty x Backstage Plugin Demo: Eliminate Context Switching for On-Call Engineers

Feb 9, 2026 By PagerDuty Inc. In PagerDuty

Join Rocío, Product Manager of the Forward Deploying Engineering team at PagerDuty, as she demonstrates how the PagerDuty Backstage plugin transforms incident response by bringing critical operational data directly into your developer portal.

View Video