Operations | Monitoring | ITSM | DevOps | Cloud

March 2022

SREcon 2022 Americas Wrap Up

Hi everyone! We had a fantastic time at SREcon 2022 Americas last week, and I thought I’d share our stories and experiences. As the SRE community grows and evolves, these chances for collaboration become more and more important… and fun! Although I only attended virtually, I could still feel an exciting atmosphere as great minds came together.

Severity Levels (What They Are & Why They Matter)

Wondering about severity levels? We explain what incident severity levels are, how to classify them, and how they will affect your incident management process. What are severity levels? Incident severity levels are the measure of the impact an incident will have on a system. In general, a lower number severity level, such as SEV-1, denotes a higher impact on the system.

Incident Response Lifecycle | A Complete Explanation

Wondering about the incident response lifecycle? We explain what it is, and how each phase helps lead to effective incident resolution. What is the incident response lifecycle? The incident response lifecycle is an organization’s framework for responding to an incident that disrupts service. The incident response lifecycle contains the following phases.

Postmortems Now Called Retrospectives in Blameless

Something big happened at Blameless this month — our “Postmortem” feature was updated to its new name, “Retrospective”. To the naysayer, I suppose you’re thinking, This seems trivial. Different teams call it different names anyway, so why bother making the change? First let me say, thank you for reading our blog and I hope you finish this one through to the end. Now, allow me to explain our reasoning and why we’re excited about this update.

Alert Fatigue in SRE: What It Is & How To Avoid It

Wondering about alert fatigue? We describe what it is, how it affects software development teams, and how to avoid it. What is alert fatigue? Alert fatigue is the phenomenon of employees becoming desensitized to alert messages because of the overwhelming volume they receive, and the number of false positives they receive. The risk with alert fatigue is that important information will be overlooked or ignored.