Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The Debrief: AI can help you never forget incident follow-up actions again

Noting follow-up actions is really important at the end of the incident response process. The problem is that it can be really easy to overlook certain actions or forget to do them entirely. With Suggested Follow-ups, this is now a thing of the past. In this episode, you'll hear from Rob, the project lead for our latest Suggested Follow-ups feature, to get a peek behind the curtain.

Deliver Better Customer Experiences with PagerDuty for Customer Service

Want to deliver better customer experiences and meet your SLAs? PagerDuty for Customer Service Operations helps organizations connect the right teams at the right time, address urgent tickets, efficiently scale their 24-7 customer support model, and enhance cross-functional collaboration.

The Unplanned Show, Ep. 29: Major Incident Management with Davis and Chris

Not all incidents are created equal. How do you handle major incidents so that they don't spiral into a chaotic mess, incinerating productivity across too many teams? How do you prevent major incidents and learn from the ones you've had? "Major Incident Management" has been a practice for a long time, but as companies depend even more on digital services and revenue channels, while trying to do more with the same or less, something has to change.

Navigating the Evolving Landscape: A Deep Dive into REST API Versioning Strategies

In the ever-evolving landscape of APIs, ensuring seamless interactions and managing changes becomes crucial. While innovation and adaptability are essential, maintaining backward compatibility is equally important to avoid disruption for existing users. This is where REST API versioning comes into play. Versioning allows you to introduce new features or changes to your API in a controlled manner, while simultaneously keeping older versions running smoothly.

Negotiating Priorities Around Incident Investigations

There are countless challenges around incident investigations and reports. Aside from sensitive situations revolving around blame and corrections, tricky problems come up when having discussions with multiple stakeholders. The problems I’ll explore in this blog—from the SRE perspective—are about time pressures (when to ship the investigation) and the type of report people expect.

Combating IT Alert Fatigue

With the growing complexity of IT systems, managing alerts and notifications without succumbing to the crippling effects of alert fatigue has never been more challenging. Alert Fatigue occurs when the volume of notifications makes it impossible to discern signal from noise, desensitizing the recipient to warnings, some of which end up representing critical issues.