%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Going beyond MTTx measuring what "good" incident management looks like

Mar 19, 2025 By Incident.io In Incident.io

Traditional MTTx metrics have long been the go-to measure for incident management effectiveness, but they often fail to provide a full picture or drive meaningful improvements. We analyzed data from over 100,000 incidents to develop new industry benchmark metrics that better define what "good" incident management looks like.

View Video

Incident.io

Incident Management

Read more about Going beyond MTTx measuring what "good" incident management looks like

Rethinking WhatsApp Alerts - A Data-Driven Approach

Mar 19, 2025 By Kaushik Thirthappa In Spike

WhatsApp has become a major alerting channel for incident response teams. It's popular and for many, a great alternative to SMS. In our 2024 recap, we mentioned how Spike sent over 25,000 alerts on WhatsApp. It is now the 2nd most used alert channel for responders on Spike (rising from 4th spot in 2023). But... I will be the first one to admit – the WhatsApp alerts experience needed work to help responders react to incidents quicker!

Read Post

Spike

Read more about Rethinking WhatsApp Alerts - A Data-Driven Approach

PagerDuty Setup: From Beginner to Pro in 10 Steps

Mar 18, 2025 By Kaushik Thirthappa In Spike

This comprehensive guide walks you through the complete PagerDuty setup process, organized into 10 steps. We've structured the guide to match your team's growth journey—starting with essential configurations for small teams, advancing to robust solutions for growing teams, and wrapping up with enterprise-grade features for large organizations. By the end, you'll have a fully operational incident management system set up on PagerDuty tailored to your specific needs.

Read Post

Spike

Read more about PagerDuty Setup: From Beginner to Pro in 10 Steps

Finding the Right Tools for Digital Transformation

Mar 18, 2025 By Eric Forseter In PagerDuty

Given the current climate in the federal government, it’s critical that public sector IT leaders find innovative solutions to do more with less. That’s a real challenge for these leaders who must balance with current alert backlogs against their agency limited IT budget and resources. Everyday, more than a thousand alerts to track down and as response times are slowing and some incident managers are burning out.

Read Post

PagerDuty

Read more about Finding the Right Tools for Digital Transformation

Feature Spotlight - Task Lists

Mar 17, 2025 By xMatters In xMatters

When an incident occurs, teams often perform a known set of steps in a specific order to help identify and triage the incident. For Base and Advanced plan users, the Incidents menu includes a Task Lists section where teams can build out priority lists for different incident types or use cases. For example, a list of failover tasks, or the tasks required to perform a deployment rollback. With task lists, Incident Commanders can be sure that resolvers know exactly what needs to be done to quickly resolve incidents.

View Video

xMatters

Incident Management

Read more about Feature Spotlight - Task Lists

Runbook Automation v5.10 Release Notes

Mar 14, 2025 By PagerDuty In PagerDuty

Join us to hear and see what's new in Runbook Automation and Rundeck v5.10!

View Video

PagerDuty

Read more about Runbook Automation v5.10 Release Notes

Scientific Incident Management with Dan Slimmon

Mar 13, 2025 By Rootly In Rootly

Dan Slimmon is an incident management veteran who's worked at Etsy, HashiCorp, and now leads consulting and training on pragmatic, non-bureaucratic incident response. In this episode, Dan shares his philosophy on "scientific incident response," the importance of hypothesis-driven troubleshooting, and why incidents should be seen as normal in complex systems.

View Video

Rootly

Read more about Scientific Incident Management with Dan Slimmon

Opsgenie is shutting down. Here's what that means, and how incident.io can help

Mar 13, 2025 By Stephen Whitworth In Incident.io

Atlassian recently announced they’ll be shutting down Opsgenie, their popular on-call alerting tool. After June 4, 2025, no new Opsgenie accounts will be created, and by April 5, 2027, the service will shut down completely. Users don’t seem happy about it. If you’re currently using Opsgenie, this news is significant. A key part of your incident response process is disappearing, and Atlassian suggests moving to their other products, like Jira Service Management or Compass.

Read Post

Incident.io

Read more about Opsgenie is shutting down. Here's what that means, and how incident.io can help

A seven-step framework for running incident debriefs

Mar 13, 2025 By Chris Evans In Incident.io

Ever wrapped up an incident, thought 'Phew, glad that’s over,' only to feel your stomach drop when you see the dreaded "Incident Debrief" on your calendar? We've all been there. Incident debriefs don't need to feel like sitting through your least favorite school subject. They can (and should!) actually be engaging and useful. At incident.io, we've found a simple, repeatable, and blameless framework.

Read Post

Incident.io

Read more about A seven-step framework for running incident debriefs

How we responded to a 2+ hour partial outage in Grafana Cloud

Mar 13, 2025 By Mick Gregg In Grafana

On Tuesday, Feb. 18, 2025, we experienced an outage that lasted approximately 150 minutes and impacted roughly 25% of our Grafana Cloud services. To our customers: we are very sorry and more than a little embarrassed that we stepped outside our own processes and advice to cause this. You rely on us to help monitor and troubleshoot your environments, and this type of incident obviously makes it harder for you to do that.

Read Post