Incident Management

LISA19 - Lightning Talk by Squadcast : How to SRE without an SRE on Your Team

Squadcast is an incident management tool that’s purpose-built for SRE. Create a blameless culture by reducing the need for physical war rooms, centralize SLO dashboards, unify internal and external SLIs and automate incident resolution with Squadcast Actions and create a knowledge base to effectively handle incidents.
pagerduty

Birth of the Angry Bear Ringtone

Did you know ringtones in the PagerDuty mobile app are one of the most-requested features customers contact us about? And have you ever wondered what makes a good ringtone and how we come up with them? Imagine the following: You’re on an on-call rotation with no end in sight. There might be a trusted responder you can page in for help, but they’re already burnt out. The Incident Commander won’t be any assistance, because you are the Incident Commander.

victorops

Top Metrics for Measuring DevOps Delivery Value

Software developers and operations teams are constantly improving the way they move code into production and execute tests to maintain consistent delivery of reliable services. But, how do most organizations track the success of organizational changes? When a company adopts DevOps principles, how do they show the value of these changes to the engineering teams and the overall business?

pagerduty

Postmortems vs. Retrospectives: When (and How) to Use Each Effectively

When we announced the launch of our Retrospectives Guide, we wrote about the value of scaling the continuous improvement mindset to beyond Product Development at PagerDuty by establishing the RetroDuty community. In this installment of our blog post series on retrospectives, I highlight the differences between postmortems and retrospectives. You might have heard of postmortems and/or retrospectives before reading our guides.

squadcast

Pavlos Ratis shares his experience on being an SRE

Pavlos is a Site Reliability Engineer based in Munich, Germany. He likes building software and expanding his knowledge around the reliability of services and their infrastructure. He has created a few open-source SRE projects such as the awesome-sre, Wheel of Misfortune, Availability Calculator, and awesome-chaos-engineering to assist teams and individuals in getting on board with the SRE culture.

victorops

Healthcare as a Guide for Incident Response and Incident Management

The core of the DevOps movement is about breaking down barriers between developers and operations, allowing both sides to work as a team. This means making sure that everyone has access to the systems they require while enabling cross-tool visibility and collaboration.

logsign

8 Best Incident Response Use Cases

Incident response is a well-organized approach used in organizations’ IT departments in order to combat and manage the aftermath of a cyberattack or a security breach. The purpose of using incident response is to get out of the nightmare that includes limiting the damage and reducing the costs and recovery time of the incident. The people who perform incident response are called Computer Security Incident Response Team (CSIRT) and they follow company’s Incident Response Plan (IRP).

exigence

From Mayhem to Modernization: The Evolution of Critical Incident Management

Let’s face it, managing a critical incident has never been a walk in the park. Even, in the “good old days,” before the great cloud revolution and the onslaught of digital transformations, an incident often meant mayhem. Processes were manual, time consuming, difficult to execute, document, and learn from. Getting all the right people in the “same room” at the right time – was nearly impossible. Lots of time was wasted chasing down the right folks.

firehydrant

Severity Matrix Updates

We’re on a mission to make responding to incidents a bit less chaotic. One of the best features we offer (we’re definitely not biased, no way) is a simple way to define how a severity gets determined when you open an incident. We call it the severity matrix, and today it has a new look. Previously, we had a preset list of conditions and impact that allowed you to pick a severity that matched them.