On Call


The 5 Central Tenets of a Great On-Call Culture

Working for VictorOps, and now Splunk, has allowed me to experience the on-call process from several distinct angles. And, working in a customer-facing role, I’ve witnessed the full spectrum of DevOps maturity – from downright DevOps mastery to the kinds of nightmare scenarios that haunt the dreams of on-call professionals.


How to avoid on-call burnout

It sucks to be on-call when processes are not well defined and streamlined. Especially around the holidays. You really don't want to hear your phone repeatedly going off right when you're sitting for Christmas dinner with your loved ones or getting to unwrapping the good presents (the ones with the sparkly wrapping paper :P). Your on-call team’s stress levels reflects the health of your system, the cleanliness of your code and the culture of your organization.


Reducing alert fatigue with GoAlert, Target's on-call scheduling and notification platform

At Sensu Summit 2019, Adam Westman, Sr. Engineering Manager at Target, introduced us to GoAlert, their on-call scheduling and notification open source project. In this post, I’ll recap his talk, sharing the journey that led them to build GoAlert, the problems they’ve solved, and how they use GoAlert with Sensu Go to simplify monitoring and reduce alert fatigue.


On-call doesn't have to be stressfull

“Being on-call is a critical duty that many operations and engineering teams must undertake to keep their services reliable and available. However, there are several pitfalls in the organization of on-call rotations and responsibilities that can lead to serious consequences for the services and the teams if not avoided.


Best Practices for Managing Multiple On-Call Teams

Alerting has come a long way from the days of paging an on-call administrator in the middle of the night, to multiple on-call teams that run and manage incident response around the clock. This is because as organizations grow and scale, responding to incidents also gets more complex and you often need more than one team to get involved to successfully resolve an incident.


Why your devs suck at dev-on-call

Modern software production stops for no one, and everyone is needed to keep it rolling. Every dev is on-call. Great speed and friction produce a lot of heat, and when everything is on fire all the time, even the best devs and engineers struggle to keep the train speeding onwards without getting burned. What makes maintaining modern production so hard? And what is the difference between being good and being bad at dev-on-call? Let’s dive in and see.


Developing On-Call Escalation Processes That Work

In a world of highly-integrated systems, microservices, cloud infrastructure and constant development, DevOps and IT teams are tasked with finding better ways to keep up with their own processes. By actively testing throughout the development lifecycle and preparing for incident response, you’ll build more resilient services up front while simultaneously being prepared when things go south.