Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

PD Summit21: AWS and PagerDuty: Better Together -- A Digital Transformation Journey

PagerDuty’s platform for real-time operations helps teams manage a complex transition from siloed and centralized approaches to multiple, distributed teams supporting a hybrid cloud infrastructure. To make this journey successful, one thing is clear: your people, technology, and operational processes need to be aligned in real time. That’s why we’re continuing to invest in our partnership with AWS. The integrations we’re bringing to market have always been centered on unlocking AWS’s unprecedented scale and agility for our joint customers.

PD Summit21: Sumo Logic: Streamline Incident Management to Drive Application Modernization

As application modernization drives an increase in complexity, managing the signals they generate becomes increasingly important in order to manage alert fatigue, mantain reliability, and accelerate innovation. Sumo Logic provides a unique, two-way integration with PagerDuty that collects incident messages from PagerDuty and populates pre-configured dashboards to provide a complete view of their alerts by displaying top incidents, escalations, teams and urgency, as well as providing the capability for users to send notifications to PagerDuty when critical conditions in their applications or infrastructure are detected in Sumo Logic.

PD Summit21: MUX: Video Observability: Operational Alerting for Responding to Issues In Real-time

Streaming video accounts for the majority of internet traffic and your applications and infrastructure almost certainly include video. Mux Data allows you to easily monitor the real-time quality of experience delivered to your video viewers and integrating with PagerDuty you can automate a response and reduce the time to resolution when something goes wrong. We will cover the basics of video monitoring and how integrating with PagerDuty can ensure a great experience for viewers.

What's New: Updates to Event Intelligence, Integrations, and More!

If you thought that the product announcements from PagerDuty’s largest event of the year, PagerDuty Summit 2021, was all we had in store for you, think again! We’re excited to announce that the July Release comes with a new set of updates and enhancements to the PagerDuty platform! You can learn about our latest capabilities via the Q1 PagerDuty Pulse or read below for the highlights.

Operational Resilience: Grow Your Business Despite Increasing Threats

While most businesses have an emergency preparedness plan in place, organizations have to wonder if their current plans are enough to defend against the growing list of major incidents and critical events affecting business. According to the 2020-21 Major Incident Management Annual report, an emergency preparedness plan isn’t enough to combat the growing threat landscape. To combat the rise in critical events, organizations must maximize operational resilience.

Monitoring and Alerting 101: Monitoring Best Practices

An effective monitoring system is paramount to smooth business operations. As the need for a fast, responsive software experience gains momentum, monitoring becomes an indispensable driving force. Monitoring systems enable IT teams to proactively observe the health and responsiveness of critical environments and applications. Without monitoring, organizations must depend on customers or internal departments to receive notice of system issues.

PD Summit21: Responding to Chaos with Gremlin and PagerDuty

Incident response is something you hope to never need, but when you do, you want it to go smoothly and seamlessly. Normally the knowledge of how to handle incidents within your company will be built up over time, getting better with each incident. While tools such as PagerDuty's Major Incidents Application can help you recover quickly, the process you follow is just as important. This documentation will allow you to learn from the start something which has taken us years to build up. Giving you a head start on how to deal with a major incident in a way which leads to the fastest possible incident recovery.