Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

The Reverse Red Herring

During an incident, time is fungible. At points it seems to go way too fast, and at times it seems like an eternity for a command to complete. More importantly, however, is how it feels to be in an incident. It’s a heightened state of being, where any and every piece of information could be “the one” that helps crack open what is really going on. Likewise, there is an inherent distrust of incoming information.

Post-Incident Review | Why It's Important & How It's Done

Curious about the post-incident review process? We give a complete explanation of post-incident reviews and why they are important and discuss best practices. What is a post-incident review? A post-incident review is an evaluation of the incident response process. The goal of the process is to have clear actions to improve the incident response process and to also help prevent further incidents.

SRE: From Theory to Practice | What's difficult about on-call?

We launched the first episode of a webinar series to tackle one of the major challenges facing organizations: on-call. SRE: From Theory to Practice - What’s difficult about on-call sees Blameless engineers Kurt Andersen and Matt Davis joined by Yvonne Lam, staff software engineer at Kong, and Charles Cary, CEO of Shoreline, for a fireside chat about everything on-call. As software becomes more ubiquitous and necessary in our lives, our standards for reliability grow alongside it.

SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

This month I hit my 2-year anniversary with Blameless and as our industry progresses and matures, I thought it would be a good opportunity to look back and review how far we have come and also ruminate on where we’re headed. Our shared vision at Blameless is to help engineering teams adopt reliability practices with ease and advance to a resilient culture.
Featured Post

The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

In the latest of an occasional series, today we hear from Kurt Andersen, SRE Architect at Blameless, discussing the evolution of incident management, current trends in site reliability affecting engineering teams, as well as an update on how Blameless is addressing the needs of SRE and DevOps.

Managing Burnout | Tips To Minimize The Impact

Burnout is real. Today, the source of burnout can be anything from pandemic fatigue, to the onslaught of political divisiveness, or simply the pace of life worldwide. Whatever the culprit, we’re living in a stressful time. People working in cloud native environments definitely feel burnt out. Silicon Valley investor Marc Andreessen famously said, “Software is eating the world,” and that seems to be quite true. High demand is fueling churn. System and cloud operators feel pressure.