March 2020

Best Practices for Pragmatic Incident Command

Mar 31, 2020 By Blameless In Blameless

The goal of this piece is to provide some practical advice on how teams can coordinate and respond to complex, dynamic incidents. After all, incidents are unplanned investments that surface valuable learnings for improvement. For the purposes of this blog, we define incidents as situations where there is a need for coordination among multiple people working on the same problem. There will be incidents where this is not the case.

Read Post

Blameless

Read more about Best Practices for Pragmatic Incident Command

SRE for Business Continuity in the Face of Uncertainty

Mar 24, 2020 By Hannah Culver In Blameless

No, it won’t be possible to continue operating business-as-usual. For the unforeseeable future, teams across the world will be dealing with cutbacks, infrastructure instability, and more. However, with SRE best practices, your team can embrace resilience and adapt through this difficult time.

Read Post

Blameless

Read more about SRE for Business Continuity in the Face of Uncertainty

Our Top 5 On-Call Practices

Mar 19, 2020 By Emily Arnott In Blameless

On-call: you may see it as a necessary evil. When responding to incidents quickly can make or break your reputation, designating people across the team to be ready to react at all hours of the day is a necessity, but often creates immense stress while eating into personal lives. It isn’t a surprise that many engineers have horror stories about the difficulty of carrying a pager around the clock. But does on-call have to be so dreadful? We think not.

Read Post

Blameless

Read more about Our Top 5 On-Call Practices

6 Steps to a More Effective Postmortem

Mar 19, 2020 By Jacob Warren In Blameless

Detailed and specific description of impact? Check. In-depth root cause analysis? Check. Clearly defined and easy to follow resolution? Check. Postmortems present an incredible learning opportunity, despite the inherent cost of time and effort. They ensure an incident is documented, that all contributing factors are understood, and that effective preventative actions have been put in place to reduce the likelihood or impact of recurrence.

Read Post

Blameless

Read more about 6 Steps to a More Effective Postmortem

The Incident Response Approach to Remote Work

Mar 16, 2020 By Emily Arnott In Blameless

In response to recent events, many organizations are implementing social distancing programs such as remote work. Successfully transitioning to remote work does come with challenges, but the right practices and attitudes can make it much less painful (and safer for you than heading into the office). We like to think of incidents as “unplanned investments,” and a sudden switch to remote work could be considered an unplanned investment of its own.

Read Post

Blameless

Read more about The Incident Response Approach to Remote Work

Great Incident Response Requires 3 Major Components

Mar 12, 2020 By Hannah Culver In Blameless

With remote work becoming more common, and distributed teams the norm, incident response has become even trickier. Years ago, everyone would gather in a war room and sort through the issue together, boots on the ground. Now, things have shifted. Remote work is only projected to increase, and teams need to be able to adapt in order to resolve incidents quickly and efficiently, even if team members are a thousand miles away. But how can we make great incident response a reality?

Read Post

Blameless

Read more about Great Incident Response Requires 3 Major Components

How ITIL, DevOps, and SRE Work Together for your Organization

Mar 10, 2020 By Hannah Culver In Blameless

When someone asks what type of “shop” your organization is, can you answer confidently that it’s ITIL, DevOps, or SRE? Maybe some people can, but if you’re a large enterprise, the answer is likely a combination of several of these operating models, especially since SRE has become a key implementation of DevOps. ITIL can work effectively alongside DevOps and SRE principles, though at first glance they appear to be different species.

Read Post

Blameless

Read more about How ITIL, DevOps, and SRE Work Together for your Organization

How do we Apply SRE Outside of Engineering with Google's Dave Rensin

Mar 3, 2020 By Blameless In Blameless

The first keynote speaker, he is a senior director of engineering at Google. You might know him as they guy who founded and leads the customer reliability engineering function at Google. CRE, this is a team that teaches the world SRE principles and practices. Now I want to tell you a bit more about him, because I think he has a very unique view and perspective. He is deeply compassionate and intuitive as a teacher, not just a lecturer.

Read Post

Blameless

Read more about How do we Apply SRE Outside of Engineering with Google's Dave Rensin

Operations | Monitoring | ITSM | DevOps | Cloud

March 2020

Best Practices for Pragmatic Incident Command

SRE for Business Continuity in the Face of Uncertainty

Our Top 5 On-Call Practices

6 Steps to a More Effective Postmortem

The Incident Response Approach to Remote Work

Great Incident Response Requires 3 Major Components

How ITIL, DevOps, and SRE Work Together for your Organization

How do we Apply SRE Outside of Engineering with Google's Dave Rensin

Monthly Archive

Follow Us