%term

Customers over control: how we measure On-call reliability

May 28, 2026 By Article In Incident.io

Our On-call product has a lot of great features: configuring escalation paths, viewing rotas and schedules, requesting cover, etc. However, when framing its reliability, we reduce it down to two critical pieces of functionality: It’s not that we’re happy if only these parts are working, but they are the most important parts. In this post, I'll go into more detail on how we think about their reliability.

Read Post

Incident.io

Read more about Customers over control: how we measure On-call reliability

Engineering teams in 2027

May 19, 2026 By Article In Incident.io

There's a conversation I keep having with our design partners at incident.io. It starts when I ask "what are you doing with AI internally?" and lands in a similar place every time. The shape of how their engineering teams work is changing fast. Not in vague "AI is transforming everything" ways, but in concrete, repeatable patterns. Different companies are building the same things. The frontier teams are six to twelve months ahead of the average, and they're describing the same future.

Read Post

Incident.io

Read more about Engineering teams in 2027

PagerDuty Rescue Program

May 13, 2026 By incident-io In Incident.io

We're announcing the PagerDuty Rescue Program. PagerDuty worked. For a long time, it was the standard. But the world's changed, and PagerDuty hasn't. The single biggest reason teams stay on PagerDuty isn’t the product - it’s the pain of leaving. So, we’ve removed every barrier. You've wanted out for a while. Now, nothing is stopping you.

View Video

Incident.io

Incident Management

Read more about PagerDuty Rescue Program

Humans aren't fast enough for 4 9's

May 11, 2026 By Article In Incident.io

When thinking about Service Level Objectives (SLOs) and contractual Service Level Agreements (SLAs) for availability, I always like to put the percentages into concrete numbers. It’s easy to lose track of what’s meant when saying “99.95%” availability, and even more is lost when thinking how much harder it is to achieve 99.99% compared to 99.95%. On a monthly basis, and in concrete terms, 99.95% availability means you get 21 minutes and 55 seconds of downtime.

Read Post

Incident.io

Read more about Humans aren't fast enough for 4 9's

Behind-the-scenes: Building Post-mortems | incident.io team

Apr 29, 2026 By incident-io In Incident.io

We rebuilt our post-mortems from the ground up. In this video, Pete and the engineering team talk through how they built it: the decisions they made, the problems they were solving, and what it took to ship AI-native post-mortems.

View Video

Incident.io

Incident Management

Read more about Behind-the-scenes: Building Post-mortems | incident.io team

Who's on call? How Claude helped us calculate this 2,500x faster

Apr 28, 2026 By Article In Incident.io

Schedules are a core part of any on-call system. In ours, they define who to page and when. But people use them in lots of other ways too: checking their next shift, asking for cover while at the gym, keeping a Slack user group up to date, or updating a Linear triage responsibility. For many of our customers, they’re one of the main ways they interact with our product, and as they’re such a foundational part of On-call, it’s very important they work well.

Read Post

Incident.io

Read more about Who's on call? How Claude helped us calculate this 2,500x faster

What does using AI for post-mortems actually mean?

Apr 23, 2026 By Article In Incident.io

Everyone is using AI to help with post-mortems now. The pitch is obvious: post-mortems are time-consuming, the blank page is brutal, and AI is very good at producing structured, confident-sounding documents quickly. We're not here to push back on that. We've built AI into our own post-mortem experience, pulling your Slack thread, timeline, PRs, and custom fields together and giving your team a meaningful starting point in seconds. We think that's genuinely valuable, and the teams using it agree.

Read Post

Incident.io

Read more about What does using AI for post-mortems actually mean?

How it feels to run an incident with AI SRE

Apr 23, 2026 By Article In Incident.io

We've been building the broader incident.io platform for several years now, and one thing we've learned is that UX matters more here than almost anywhere else. When an incident fires, there's no room for poorly designed interfaces or fumbling through features you haven't touched in a while. The product has to be ergonomic: easy to pick up, easy to navigate, with the right things at your fingertips at exactly the right moment. We've put a lot of effort into this over the last 5 years.

Read Post

Incident.io

Read more about How it feels to run an incident with AI SRE

Why post-mortem action items die

Apr 16, 2026 By Article In Incident.io

You can run the best debrief of your life. Honest timeline, blameless tone, real insights. People leave the room nodding. And then nothing happens. This is the last mile problem of post-mortems - and it's an easy trap to fall into. When you've just been through a stressful incident, getting it back up is the priority. Once it's over, the post-mortem itself can feel like the finish line. You've documented what happened, been honest about it, identified what went wrong. It feels like the work is done.

Read Post

Incident.io

Read more about Why post-mortem action items die

How to migrate your paging tool without breaking your team

Mar 20, 2026 By Article In Incident.io

Most engineering teams don’t migrate their on-call and paging systems unless absolutely necessary. No matter how painful their current solution, it's one of those changes that people put off for as long as possible because the cost is real. The disruption, the retraining, the risk of missing a critical page during the transition. It's not something you do on a whim.

Read Post

Incident.io

Read more about How to migrate your paging tool without breaking your team

Operations | Monitoring | ITSM | DevOps | Cloud

Customers over control: how we measure On-call reliability

Engineering teams in 2027

PagerDuty Rescue Program

Humans aren't fast enough for 4 9's

Behind-the-scenes: Building Post-mortems | incident.io team

Who's on call? How Claude helped us calculate this 2,500x faster

What does using AI for post-mortems actually mean?

How it feels to run an incident with AI SRE

Why post-mortem action items die

How to migrate your paging tool without breaking your team

Monthly Archive

Follow Us