The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
The Best in Enterprise Resilience™ Certification program affirms your organization’s readiness to manage critical events across a number of domains.
Technical teams are under more pressure than ever to move faster, protect revenue and availability, and push mean time to resolve (MTTR) ever lower. However, teams frequently find themselves encumbered by complex, repetitive, and manual tasks, rather than innovating. When urgent incidents arise, organizations often have to wait for specific developers or subject matter experts (SMEs) to deploy a fix.
4 best practices for breaking down silos and establishing a culture of shared responsibility toward reliability.
We know commitment issues are the real deal, especially when it comes to significant and costly tech investments. Understanding how the market is performing and what’s up ahead is critical for investing in AIOps. Our crew is here to help you through the challenging decision-making days and offer up the best analyst guidance.
In my past experience as an SRE I’ve learned some valuable lessons about how to respond and learn from incidents. Declare and run retros for the small incidents. It's less stressful, and action items become much more actionable. Decrease the time it takes to analyze an incident. You'll remember more, and will learn more from the incident. Alert on pain felt by people — not computers. The only reason we declare incidents at all is because of the people on the other side of them.