Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

How to prepare for and communicate during downtime

The unfortunate reality about running a web service is that every now and again, you’re going to have downtime. Even the best web companies have the occasional blip in service. If downtime is inevitable, then it’s best to plan ahead so that you can be ready. After all, prior preparation prevents poor performance.

How to not lose your s#!t during an incident

Often I am asked how I always seem calm and poised during incidents. This persona is not entirely accurate as I am more like a duck in these scenarios, calm on top of the water, but paddling frantically under the water. I have learned some tricks that have helped me stay calm and drive incidents to resolution.

Uniting technical and non-technical teams for better incident response

It takes a village to respond to and resolve incidents. But the teams involved in incident response often work in silos: SREs and devs are heads down fixing the problem, support is flooded with emails/tickets, and marketing/PR may be putting out fires on Twitter. Even if there’s some communication happening over chat or across desks, there’s typically room for improvement with getting these teams to work together when it matters most.

How to build a support team from the ground up

For a couple reasons, building a support team is pretty hard. It’s hard because there are no shortcuts to finding and training the right person. There are a lot more mediocre and poor support advocates out there than there are excellent ones. And the excellent ones are probably pretty happy where they are.