Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Demo Roundups! What's New in Schedules: Flexible Shifts + AI Conflict Resolution

Manual scheduling and on-call gaps cost your team sleep and sanity. Join us for a demo of PagerDuty's latest schedule experience improvements. From iCal-compatible shift management to AI-powered conflict resolution, see firsthand how to build bulletproof on-call coverage with minimal operational overhead.

Meeting Developers Where They Work: PagerDuty + Spotify Portal for Backstage

From the beginning, PagerDuty has been built by developers, for developers. Our mission has always been to help development teams build faster and resolve incidents more efficiently by meeting them where they work. Building on PagerDuty’s existing plugin for Spotify for Backstage, we are thrilled to announce the PagerDuty plugin for Spotify Portal for Backstage to continue bringing enterprise-grade incident management into even more developer workflows.

Best MSP Tools of 2025

Managed service providers (MSPs) are strong multitaskers, handling monitoring, documentation, security, infrastructure maintenance, support, and more for each of their clients. So clearly the need for a strong set of MSP tools is one that cannot be overlooked. In the current state of IT, clients expect swift response and seamless service delivery no matter the time of day, meaning, MSPs must invest in a toolkit that will enable them to deliver high-quality service 24/7.

Service disruption on October 20, 2025

When the internet goes down, our primary job is to help everyone get back up, as fast as possible. Of the almost half a million incidents we've helped our customers solve, there are some which stand out for both their scale and impact. One of these happened on Monday, October 20, when AWS had a widely covered major outage in their us-east-1 region, from 07:11 to 10:53 UTC. We’re hosted in multiple regions of Google Cloud and so the majority of our product was unaffected by the outage.

How to create Rotation or Shift Schedule on Calendar easily

‍ Managing shift schedules, rotation cycles, and different types of shifts has always been one of the trickiest parts of workforce management. Whether you’re coordinating day shifts, night shifts, evening shifts, or split shifts, keeping track of employee availability, selected days, and total hours across multiple teams is a challenge.

How Do I Route Alerts by Location to the Right On-Call Team?

When your company has multiple offices or operational sites – whether that’s across the U.S. or around the world – getting alerts to the right team isn’t as easy as just checking who’s on duty. Events can come from a wide range of sources tied to different physical locations, time zones, or even separate departments, and not every alert is meant for every team. Let’s say your company has operations in New York, Dallas, and San Francisco.

When IT Alerts Go Bump in the Night: A Halloween Tale of IT Alerting with SIGNL4

As the witching hour approaches, your data center hums quietly – servers glowing like jack-o’-lanterns in the dark. Everything seems calm… until suddenly, your phone lights up with a chilling alert. CPU usage is spiking. Network latency is haunting your system. The ghost of downtime lurks nearby. Welcome to the spooky world of IT alerting – where nightmares come true if your team isn’t ready.

Detect and map third-party outages with Datadog External Provider Status

Modern applications depend on dozens of external cloud platforms, APIs, and SaaS services to function. But when those providers experience issues, engineers often spend valuable time asking a basic question: Is the problem with us or with them? Provider-maintained status pages are often slow to update, leaving teams waiting for confirmation while incidents escalate. This delay wastes valuable time, prolongs investigations, and risks customer trust.

The Hidden Risk of DNS - Lessons from the AWS Outage & Why You Need DNS Spy Monitoring NOW

On October 20, 2025, much of the internet came to a halt. Apps wouldn’t load. Payments failed. Cloud dashboards went dark. From Fortnite to Alexa, Snapchat, and countless business platforms, users across the world were suddenly offline — all because DNS broke inside Amazon Web Services’ (AWS) US-East-1 region.