Operations | Monitoring | ITSM | DevOps | Cloud

On Call

Syncing PagerDuty Schedules to Slack Groups

We’ve posted before about how engineers on call at Honeycomb aren’t expected to do project work, and that whenever they’re not dealing with interruptions, they’re free to work on whatever will make the on-call experience better. However, all of our engineering rotations rely on hand-off meetings where they update the Slack groups with everyone who’s on call. During my last shift, a small problem kept causing friction for some of our incident management automation.

The ultimate guide to on-call schedules

An Ultimate Guide to on-call schedules? You might think this sounds overly grandiose for what’s essentially putting people into a list and rotating through them. But you’d be flat-out wrong. Getting your on-call setup correct is as real and as important as it gets, and getting things wrong can lead to prolonged incidents, burnt out employees, and damaged company reputation.

On-Call Rotations and Schedules: A Guide for 2024

In an increasingly connected world where businesses operate around the clock, the importance of having an effective on-call system cannot be stressed enough. With technological advances and the expectation of immediate attention to business-critical issues, creating a reliable on-call rotation and schedule is essential for ensuring operational continuity. This comprehensive guide will walk you through the various aspects of on-call rotations and schedules that you need to consider for 2024.

6 Best Free OnCall Software in 2024, Open-Source and SaaS

In the world of IT and DevOps/SRE, managing incidents efficiently is paramount. When an unexpected issue arises, having the right OnCall software can make all the difference in minimizing downtime and maintaining service reliability. OnCall software ensures that there’s always someone available to respond to incidents, no matter the time of day. This tool is vital for businesses that operate around the clock and cannot afford to let issues go unresolved for long periods.

Leveraging AI for Efficient On-call Scheduling

Regardless of industry specifications, creating and maintaining a highly functional incident management process is crucial for organizations of all sizes. The various potential applications of Generative AI in this process can significantly enhance the efficiency, accuracy, and speed of incident detection, analysis, and resolution. GenAI can be utilized across all stages of the incident management process, including preparation, response, communication, and learning.

Effective Slack on-call protocols for engineers

Talks about being on call are usually met with complaints. Here's how to alter the narrative and develop a stronger, more compassionate process. A few years ago, I took oversight of a significant portion of our infrastructure. It was a complex undertaking that, if not managed and regulated properly, could have resulted in major disruptions and economic consequences over a large area.

Time, timezones, and scheduling

Our On-call product has been in the wild for a few months now, and in this post I want to talk about building a time-sensitive system and what we did to handle some of the challenges. I’ll cover what our scheduler is responsible for, the basics of working with time, and talk a bit about how we tested our system.

The Impact of On-Call on Mental Health

Lately, I have been thinking about the mental health effects that stem from working in the cybersecurity industry. And in my research, I came across an Afternoon Cyber Tea podcast that sparked my interest. During their talk, host Ann Johnson and Dr. Ryan Louie, MD, PhD, dissect parallels between those who work in cybersecurity and those who work in healthcare, and uncover how these types of jobs affect mental health.