Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Why agentic AI is the future of IT change management

Every enterprise depends on continuous changes to its IT environment. New code releases, infrastructure updates, configuration changes, and security patches are all crucial to support continuous innovation. These same changes are also a leading source of operational risk and one of the most common causes of failures at the network, infrastructure, and software layers, resulting in outages.

Getting started with on-call

Setting up on-call is simpler than it seems. It comes down to a few clear decisions about your team and what your service actually needs. This guide walks you through those decisions. You’ll learn who to add in your rotation, how long shifts should last, when to hand off, and what coverage makes sense for your service. By the end, you’ll know exactly how to set up your first schedule and move from ad-hoc firefighting to organized incident response.

Why AI-driven automation in incident response is viable now

This article explains why AI-driven automation in incident response is feasible now. Teams can finally safely delegate repetitive and time-critical response tasks to AI Agents, which operate with contextual awareness and human oversight. The result is faster response, higher service uptime, and less alert noise – without losing control. ‍

How to Monitor SaaS Status in 2026 : A Complete Guide

This is an updated and expanded version of the older guide. According to the 2025 State of SaaS report, organizations use an average of 106 SaaS apps. Staying on top of your SaaS vendors' status is as important as monitoring your own services. The Cloudflare, AWS, Azure, and Google Cloud outages in 2025 were strong reminders of this fact.

Democratizing Reliability: Giving Non-Engineers Real Operational Power with Dileshni Jayasinghe

Many companies don’t invest in incident management until something goes wrong. commonsku took a different path. In this episode of Humans of Reliability, Sylvain sits down with Dileshni Jayasingha, VP of Technology at commonsku, to talk about what it really takes to introduce incident management in a mature, profitable SaaS that had never formalized it. From rolling out observability and incident tooling to practicing internal status updates before going public, Dileshni shares how her team built the right muscles before they were forced to.

PagerDuty Appoints Chris Ferro as Chief Legal Officer

PagerDuty, Inc. announces that Chris Ferro has joined the company as Chief Legal Officer. Ferro will oversee all legal functions at PagerDuty, including corporate, compliance, employment and product matters, with a focus on advancing business objectives while mitigating legal and regulatory risk.

AWS re:Invent 2025 - From Alert to Action: AWS + PagerDuty Agentic Ops

Hear how AWS and PagerDuty are transforming incident management with agentic & generative AI. Learn how agents within AWS Quick Suite and PagerDuty work together to detect, diagnose, and resolve incidents with less toil and swivel chair. This session explores how AI collaboration is reshaping resilience across cloud environments.