Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Practical lessons for AI-enabled companies

We went live with our first set of AI-enabled features a few months ago. Needless to say, we learned a lot along the way, as this was the first time we had experimented with generative AI. Here, I'll share some of what we've learned as we’ve grappled with using LLMs to power new products at incident.io. This will be most applicable to the application layer, AI-enabled but not AI companies.

PagerDuty Appoints Eduardo Crespo, Vice President of EMEA

PagerDuty, Inc announces the appointment of Eduardo Crespo as vice president of EMEA. Crespo will lead PagerDuty's next phase of growth in the EMEA region bringing the PagerDuty Operations Cloud to enterprise customers across EMEA to solve their biggest digital challenges.

Why more low severity incidents can be a good thing #incidentmanagement

In this clip, Dennis Henry of Okta explains why having more low-severity incidents can be a good thing. In last week’s episode of The Debrief, we had on Colette Alexander, Director of Engineering at HashiCorp, to discuss some of the myths around incident response. In that conversation, one of the myths we spoke about was the idea that asking “why” is better than asking “how.” And how, in reality, asking "how" allows you to focus more on the contributing factors that led to an incident happening, whereas “why” tends to single out a person, which can lead to a lot of blame.

Mistakes happen for many reasons #incidentmanagement

In this clip, Dennis Henry of Okta explains why it's important to remember that mistakes happen for several reasons and don't have a single cause. In last week’s episode of The Debrief, we had on Colette Alexander, Director of Engineering at HashiCorp, to discuss some of the myths around incident response.

IRL to IAC: Your Environment to PagerDuty via Terraform

Figuring out how to represent your as-built environment in PagerDuty can be confusing for new users. There are a lot of components to PagerDuty that will help your team be successful managing incidents, integrating with other systems in your environment, running workflows, and using automation. Your organization might have a lot of these components – users, teams, services, integrations, orchestrations, etc.

Live event recap: Humanizing the on-call experience

There’s no two ways about it: on-call is stressful. But with humans at the center, it’s especially important to find ways to make it as manageable and empathetic as possible. In this webinar with our friends at ELC, incident.io VP of Engineering, Noberto Lopes, and Intercom Staff Product Engineer, Andrej Blagojević, discuss their own experiences with on-call, and how the process can be better.

Incident Management: 5 Best Practices for Seamless Operations

Website incidents happen at any time for any reason. Your website might stop responding to customers. Performance may slow down. Main pages start giving client or server errors. And when they do strike, it brings frustration and confusion to your customer, leading to lower trust and engagement.

Why "why" is the wrong question to be asking after incidents with Dennis Henry of Okta

In last week’s episode of The Debrief, we had on Colette Alexander, Director of Engineering at HashiCorp, to discuss some of the myths around incident response. In that conversation, one of the myths we spoke about was the idea that asking “why” is better than asking “how.” And how, in reality, asking "how" allows you to focus more on the contributing factors that led to an incident happening, whereas “why” tends to single out a person, which can lead to a lot of blame.