Every quarter, we host a roundtable discussion centered around the challenges encountered by incident responders at the world’s leading organizations. These discussions are lightly facilitated and vendor-agnostic, with a carefully curated group of experts. Everyone brings their own unique perspective and experience to the group as we dive deep into the real-world challenges incident responders are facing today.
Before I stumbled into the tech industry (a story for another day), I spent several years in the customer service world as a server and front-of-house manager in restaurants. It was in these jobs that I first honed some critical skills that would later lead me on the path to incident response.
It has been lightly revised and reposted with his permission from the original article on Medium. Leading major incident responses can be extremely stressful. You have to quickly gather an ad-hoc team, figure out what went wrong, identify a fix and make sure this doesn't make things worse, all the while with senior leadership breathing down your neck. Are we having fun yet? Many people think having a dedicated incident commander role will solve the problem.
This blog post is adapted from my talk at SRECon EMEA 2023 - original slides are available here! Status pages are a simple yet underutilized element of incident communication. Done well, they’re a low-lift way to keep your customers and stakeholders informed when incidents impact them. But without a solid approach, updating status pages can easily become a tedious and often neglected task during incidents. In this post, we’ll cover some tips to get your status page right.
You’re in the incident channel rocking yet another incident. Comms are flowing, resolution is in sight, the team is grinding, and you’re feeling good. Then…
In today’s world, resilience is no longer a conditioned desire or methodology to try but has become a necessity for sustained success in software development and IT operations. As DevOps and Agile teams keep moving forward to cross boundaries, come up with new methodologies, and drive innovation, it is now important to have the ability to quickly recover from failures, adapt to changing conditions, and maintain high performance under pressure.
We’re proud to share that we've been recognized as a High Performer and Enterprise Leader in Incident Management for the sixth consecutive quarter in the G2 Summer 2023 Report! In total, Rootly received nine G2 awards in the Summer Report.