Operations | Monitoring | ITSM | DevOps | Cloud

February 2023

CommsFlow Messaging Templates | Blameless

Effective communication is critical during incidents. In order to minimize the impact of an incident and resolve it quickly, it's important that all stakeholders are kept informed and updated throughout the incident response process. However, communicating during an incident can be challenging, especially when dealing with multiple stakeholders and a high level of stress. On-call engineers can have their focus disrupted by switching out of their diagnostic tools to issue communications.

Types of Incident Retrospective Templates

When an incident occurs, it's important to take the time to review what happened, understand all the contributing factors, and identify systemic changes to prevent similar incidents from happening in the future. This process is known as an incident retrospective. However, conducting incident retrospectives can be time-consuming and difficult, especially when dealing with multiple stakeholders and a large amount of data.

Take the "work" out of your incident workflow: Integrating Blameless with Opsgenie

Assemble the right team for incident management fast with the new bidirectional integration of Blameless and OpsGenie. In this 30-minute live webinar, Blameless's Aaron Lober, Paul Chu, and Nicolas Philip show you how to seamlessly connect your alerting and service registry to your incident response processes. Webinar includes a live demo.

5 Best practices for developing a culture of continuous improvement

How do you create a great engineering team? Exclusively hire brilliant, tenured computer science PhDs. There we solved it. You can skip the next 400 words. (I can hear my college professor in my head saying “Humor might not be your strong suit”) Building a great engineering team isn’t easy. Understatement of the year. It’s not even a problem to be solved per se. We need to think about it as preparation to solve an infinite set of constantly evolving problems.

[SRE: From Theory to Practice] What's difficult about problem detection?

In this episode of FTTP, Kurt Andersen and Matt Davis are joined by Joanna Mazgaj and Laura Nolan to talk about the implications of and considerations for problem detection. Watch the full episode and hear them share personal stories about the types of challenges you might face. Ultimately, how do we explain and address the socio-technical concepts behind problem detection?

[SRE: From Theory to Practice] What's difficult about incident command?

Welcome back to our mini series of fireside chats with SRE experts talking about the realities of their day-to-day. Episode 2 gets intimate — What’s difficult about incident command? We invited Alyson van Hardenberg, Engineering Manager at Honeycomb.io, and Varun Pal, Staff SRE at Procore, to chat with Jake Englund and Matt Davis from the Blameless team. Watch the full conversation where they cover everything from methodologies and technical expertise to the human and social aspects of reliability engineering.

Announcing: Blameless + OpsGenie Integration

In the opening moments of an engineering incident, the most important aspect of a response plan is speed. Getting out of the gate quickly by leveraging automation to assemble the team can save precious moments during a critical engineering incident and make the difference between happy and unhappy customers downstream. This is why we’re excited to announce the integration of Blameless with OpsGenie.