The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
Be sure to register for the launch webinar on Thursday, March 30th to learn more about the latest release from the PagerDuty Operations Cloud. Rundeck by PagerDuty has long helped organizations bridge operational silos and automate away IT tasks so teams can focus more time on building and less time putting out fires. And while this mission still rings true today, our vision is to extend this reality and revolutionize all operations while continuing to build trust.
Mean Time To Repair, or MTTR, is a critical metric in IT incident management that measures the average time it takes to fix a system failure. The meaning of MTTR can be understood as the average duration needed for an IT team to recover from an incident. It is a fundamental metric for IT teams to track and analyze their efficiency in resolving incidents.
Incident Management has evolved considerably over the last couple of decades. Traditionally having been limited to just an on-call team and an alerting system, today it has evolved to include automated Incident Response combined with a complex set of SRE workflows.
On Thursday, March 9, 2023, something was afoot at our primary bank, SVB. By Friday, March 10, 2023, messages from our investors helped us quickly understand that FireHydrant needed to maneuver through a complex incident that was unfolding. Operational incidents are incidents like every other.