Operations | Monitoring | ITSM | DevOps | Cloud

January 2020

New Postmortems Design and Commenting Functionality

One of the most important steps in an incident’s lifecycle is the postmortem. It provides an essential time to reflect on what happened, what could have been done better, and how to build more resilience into a system. But we consistently hear from engineers that incredible toil is typically involved in coordinating stakeholders to write good postmortems.

2020 SRE Predictions

It’s a new year, so what will 2020 have in store for SRE? Here’s our two cents: SRE adoption will only continue to grow. However, the practice and culture shift, rather than the role, will take priority in 2020. More people (not just SREs) will have a reliability mindset, shifting reliability left through the software lifecycle. SLIs, SLOs, and error budget policies will become common practice to make this shift actionable.

What Are Service-Level Objectives? Lessons Learned

Service Level Objectives, or SLOs, are an internal goal for the essential metrics of a service, such as uptime or response speed. We’re probably familiar with this definition, but what is the value of setting these goals? We’ll take a look at SLOs as both a powerful safety net and a tool to inform the allocation of engineering resources, while also considering the cultural learnings of SLO adoption.