Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

PagerDuty's Product Drop (May 2026)

PagerDuty’s monthly drops are here! May’s drop delivers innovation, helping teams work faster and smarter with four major updates: SRE Agent Enhancements: Triage just got turbocharged. New connectivity + new capabilities = faster resolution. Shift-Based Schedules (GA planned for May): Schedules are more flexible than ever. Quick start options, custom shifts, and multi-responder support for shadow training or increased coverage.

Post-Incident Reviews in the PagerDuty UI

Turn incidents into learnings and build resilient operations with real-time collaboration and actionable insights built directly into your PagerDuty workflow. Post-incident Reviews in the PagerDuty UI are now in Early Access. Coming soon: AI-generated drafts and intelligent follow-up suggestions.#IncidentResponse.

Faster incident investigation with BigPanda and ServiceNow Now Assist

When an incident occurs, an L2/3 engineer or SRE can spend 20–30 minutes investigating across alert consoles, combing through change records, and pinging teams on Slack or Microsoft Teams. When you multiply that time spent across thousands of incidents per year by the cost of an IT outage at $14,056 per minute, the cost is staggering. Enterprises can’t afford to waste time searching across disparate tools.

A guide to setting up alerts for a new service

When you launch a new service in production, you’re working with a lot of unknowns. You don’t yet know how it behaves under real traffic or which incidents are worth waking someone up for. That makes alerting for a new service a little different from what you’re used to with an established one. The goal in the early days isn’t to get everything perfectly configured. It’s to learn enough about the service to get your alerting right.

April 2026 Early Warning Signals

April saw widespread disruptions across SaaS platforms, developer tools, and cloud services, with login failures, pipeline issues, and general service outages among the most common problems. StatusGator’s Early Warning Signals consistently identified these incidents ahead of official provider updates. In several cases, the lead time was significant. Bitbucket pipeline failures were detected 1 hour 17 minutes before acknowledgment, while Claude performance issues surfaced 59 minutes early.

Prevent outages with PagerDuty incident retrospectives

Recurring incidents are a symptom of a broken process. Your teams are working hard to get services back online, but constantly battling the same problems is frustrating and not a sustainable approach. What’s reflected here is not a failure in engineering abilities, but a deficiency in the learning that should follow an incident. When incident analysis focuses on finding a single person or team to blame, it creates a culture of fear.