Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Hear From Product PagerDuty for Customer Service Operations Lightning Talk

Learn about what's new with PagerDuty for Customer Service Operations from the Summit 2021 Launch. Our Product team shares how you can benefit from our latest updates and enhancements and enjoy demos that were recorded live from Summit 2021 featuring the PagerDuty Salesforce Service Cloud Integration V3, New Customer Service SKU, and Round Robin Workflows (Round Robin Scheduling).

PagerDuty Pulse Q1 FY22 Full Webinar

In this edition of PagerDuty Pulse, you’ll get to view our most recent platform updates and enhancements (March 2021 – June 2021) that extend from AIOPs and automation to a variety of new integrations. Teams must leverage PagerDuty and Modern Digital Operations to automate the day-to-day toil of repetitive tasks, master modern operations with full-service ownership, seamlessly collaborate across the organization, and accelerate enterprise-wide response by enabling customer service operations and business stakeholders.

Less is more: Incident management and monitoring in hybrid IT infrastructures

Many companies are continuously modernizing their infrastructure – but there is no standard way for the perfect IT infrastructure. Still, hybrid architectures have become the status quo in enterprises. Almost all organizations have migrated at least parts of their assets to the cloud or run applications as cloud services. At the same time, businesses want to dovetail their IT architecture with software development and are therefore embracing dynamic infrastructures. ‍

Resilience in Action E9: Vulnerability, Compassion, and Post-Incident Reviews in the Emergency Room with Dr. Al'ai Alvarez

‍ What can software engineers learn from post-incident reviews that physicians do in the emergency room? In our ninth episode, Christina, member of the Blameless strategy team, guest-hosts the podcast to interview both Kurt Andersen and Al'ai Alvarez, MD (@alvarezzzy). Dr. Alvarez is an assistant clinical professor of Emergency Medicine at Stanford. Clinically, he’s an emergency physician.

What is Incident Management in IT and Why does it matter?

Incident management is the process of identifying and resolving problems that occur in IT services. Incident Management is also used as a metric to measure the health of the IT Service Desk. Let’s discuss what incident management is, why it matters to your business, and how you can apply it to your organization.

Splunk On-Call prevents and cuts downtime episode length by half

Your Answer: Escalate the right alerts to the right on-call people for fast collaboration and issue resolution with Splunk On-Call. Reduce burn-out and make on-call suck less with a complete ChatOps experience that's integrated with your IT stack and incident reporting.

Chapter Nine: In Which Dinesh Experiments with Chaos Engineering

Another day, another drama! This one, though, is very much of my own making. I have been wanting to try my hand at a bit of chaos engineering for some time now but C&Js just hasn’t been ready. Sarah’s been up for it too, though, at Animapanions. And now that our CIO, Charlie has seen MTTR drop across every single technology team, thanks to the rollout of Moogsoft and the new incident management system (kudos to James), it’s pilot day.