Operations | Monitoring | ITSM | DevOps | Cloud

Incident Response

How Alert Notifications Make Incident Response More Effective

HR people have a saying: right person, right place, right time, meaning that the right resources can make all the difference when it counts. The same goes for Incident management and response, where very often the wrong person, place, or time can contribute to mounting catastrophe. As systems grow, the right person really can make the difference during an outage simply due to command or knowledge of the system.

How MBTA modernized incident response to reduce alert fatigue and improve collaboration

Citizens utilize mobile and consumer-facing applications in everyday life, so it’s no surprise that they demand seamless access and high availability of government services online. Whether it’s making payments or applying for benefits, citizens and constituents alike expect these services to be available around the clock.

Hear From Product Incident Response Lightning Talk

Learn about what's new with PagerDuty Incident Response from the Summit 2021 Launch. Our Product team shares how you can benefit from our latest updates and enhancements and enjoy demos that were recorded live from Summit 2021 featuring PagerDuty Incident Context in MS Teams, Slack Insights previews, Stakeholder Updates in ChatOps, Priority-based Business Service Subscription, Past Incidents on Mobile, Add Responder Notification Rules.

Demystifying the Hype Around XDR

Extended Detection and Response (XDR) has generated a lot of buzz recently with press, analysts, and even customers. There’s no denying that, at face value, its promise of reduced complexity and cost while increasing detection and response is alluring. As security teams look to modernize their security tooling, they’re also looking for solutions to some of their largest challenges. Is XDR the answer? What is XDR, exactly, and how do you determine if it’s right for your organization?

Pragmatic Incident Response: 3 Lessons Learned from Failures

In my past experience as an SRE I’ve learned some valuable lessons about how to respond and learn from incidents. Declare and run retros for the small incidents. It's less stressful, and action items become much more actionable. Decrease the time it takes to analyze an incident. You'll remember more, and will learn more from the incident. Alert on pain felt by people — not computers. The only reason we declare incidents at all is because of the people on the other side of them.

Enabling Faster Incident Response and Mitigating Security Risks in Financial Services

Software is eating the world. Digital Transformation is top of mind for companies looking to meet ever-growing consumer demands and digitize manual processes. This isn’t unique to the technology industry. Ecommerce, finance, healthcare, and other industries are all moving in this direction.

Kubernetes Incident Response: 5 Metrics to Watch

Kubernetes is a central part of modern IT infrastructure. Like any critical system, it is becoming a valuable target for attackers. In order to identify and respond to security threats, teams need metrics that indicate anomalous activity and can indicate a direction for investigation.

How to Introduce Automation to Incident Response with Slack and PagerDuty

Major-incident war rooms are synonymous with stress. Pressure from executives, digging for a needle in a haystack, too much noise—it’s all weight on your hardworking technical teams. Incident responders clearly need a more effective way to collaborate across various technical teams. A method that both minimizes interruptions and keeps stakeholders up to date while ensuring everyone has the right level of context to do their job.