Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Resilience in Action, E2: Adaptability, ego, and scaling with Tim Banks

Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Blameless Staff SRE Amy Tobey. Amy has been an SRE and DevOps practitioner since before those names existed. She cares deeply about her community of SREs and wants to take what she’s learned over the 20+ years of her career to help others.

Managing Burnout During COVID-19

During this crisis, we’re all trying our best to keep ourselves and others healthy, manage chaotic homes, and prioritize our mental health. However, this can be difficult even when we’re not experiencing a pandemic. With the added stress, burnout is occurring at an alarming rate with people unable to separate home from work, the increased burden of keeping everything on and heightened on-call loads, and the strain on communication.

Deserted Island DevOps Recap

April 30, 2020 Austin Parker, Principal Developer Advocate at Lightstep and co-host of On-Call Me Maybe, hosted a one-of-a-kind DevOps conference. With the cancellation of events all over the world in the face of COVID-19, virtual conferences have been blooming (see our coverage of Failover Conf here), but Deserted Island DevOps was the first ever conference held in the world of Animal Crossing: New Horizons.

How resilience and security shift left: An interview with the EVP Product & Engineering and CISO at FOX

Melody Hildebrandt is the Executive Vice President of Product & Engineering and CISO at FOX. Her career journey began with designing wargames for the Department of Defense. She has gained tremendous experience in the world of disaster planning, testing, security, and resilience from organizations like Palantir and more. Recently, she led the effort to plan for and execute FOX’s digital streaming of Super Bowl 54, including taking over an entire sound stage in the process.

How We Use Blameless to Power Remote Work

As with all other companies, the Blameless team is adapting to a world of remote work where distributed teams will need to get better than ever at staying aligned and efficient. We’ve been relying on Blameless more and more to improve how we collaborate virtually. Here are some of the top workflows and tips on how we have been using Blameless internally to streamline remote productivity.

A "Retrospective" of Amy Tobey's "The Future of DevOps is Resilience Engineering"

April 22, 2020 at 11:20 AM PST, Amy Tobey began her talk “The Future of DevOps is Resilience Engineering” at Gremlin’s Failover Conf. This talk focused on key concepts from DevOps as a way to understand resilience engineering. Amy began by having the audience participate in a group breathing exercise, taking 3 deep breaths before speaking about the yoga practice of pranayama as a way to understand DevOps.

Reflections on Gremlin's Failover Conf

April 21, 2020 thousands of industry professionals came together virtually to attend a revolutionary conference, Gremlin’s Failover Conf. With dozens of cancelled events, social distancing policies, and heightened stress due to the current crisis, it was more necessary than ever to take a moment to learn, share, and talk to one another about something we are all passionate about. We loved the experience at Failover Conf, and want to share some of our favorite parts with you.

Getting SRE Buy-in from C-Levels for Error Budgets and SLOs, Part 3

You now have postmortems properly implemented, automated, and well-structured. You’re generating reports and data automatically based on all your incidents. Two levels of management have agreed to your SRE buy-in efforts. That is a huge accomplishment! If you’re here, you’re making great traction adopting SRE best practices, but the battle is not won yet. The hardest but most strategic, important effort will be proving to your C-levels why they should buy into SRE.

Thought Leadership Panel: What is a "real" SRE?

Blameless recently had the privilege of hosting SRE leaders Craig Sebenik, David Blank-Edelman, and Kurt Andersen to discuss how can SREs approach work as done vs work as imagined, how to define SRE and DevOps and the complementary nature of the two, the ethics of purchasing packaged versions of open source software, and more. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Getting SRE Buy-in from a VP or Director for Automated Metrics and Continuous Learning, Part 2

After getting managerial approval for incident management, your SRE buy-in program is well underway. How can you prove that it’s effective, and that adopting more best practices is necessary? In part 2 of this blog series, we’re going to share how to convince a VP or director to invest in additional SRE practices to strategically improve business results: automated metrics and continuous learning.