August 2020

Here are the Important Differences Between SLI, SLO, and SLA

Aug 26, 2020 By Hannah Culver In Blameless

When embarking on your SRE journey, it can seem daunting to decipher all the acronyms. What are SLOs versus SLAs? What’s the difference between SLIs and SLOs? In this blog post, we’ll cover what SLI, SLO, and SLA mean and how they contribute to your reliability goals.

Read Post

Blameless

Read more about Here are the Important Differences Between SLI, SLO, and SLA

August SRE Leaders Panel

Aug 26, 2020 By Blameless In Blameless

In this panel, SRE Leaders Talia Nassi (Split.io) and Shelby Spees (Honeycomb.io) joined Blameless Staff SRE Amy Tobey to discuss the benefits of doing testing in production, and practical tips on how to do it safely.

View Video

Blameless

Incident Management

Read more about August SRE Leaders Panel

How SLOs Enable Fast, Reliable Application Delivery

Aug 25, 2020 By Blameless Community In Blameless

Application delivery is getting harder each day with the rise in complexity, the demand for services to be always-available, and the increasing pressure on teams to innovate. Service level objectives, or SLOs, can help. In this blog, we’ll discuss how SLOs are the key to modern application delivery, how to manage and measure them, the importance of observability for your SLO solution, and how to begin the journey to reliable application delivery today.

Read Post

Blameless

Read more about How SLOs Enable Fast, Reliable Application Delivery

SREview Issue #4 August 2020

Aug 21, 2020 By Blameless Community In Blameless

Here’s the August issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Read Post

Blameless

Read more about SREview Issue #4 August 2020

What is a Kubernetes Operator and Why it Matters for SRE

Aug 20, 2020 By Emily Arnott In Blameless

Kubernetes is an open-source project that “containerizes” workloads and services and manages deployment and configurations. Released by Google in 2015, Kubernetes is now maintained by the Cloud Native Computing Foundation. Since its release, it has become a worldwide phenomenon. The majority of cloud native companies use it, SaaS vendors offer commercial prebuilt versions, and there’s even an annual convention!

Read Post

Blameless

Read more about What is a Kubernetes Operator and Why it Matters for SRE

Here are the Metrics you Need to Understand Operational Health

Aug 19, 2020 By Blameless Community In Blameless

In recent polls we’ve conducted with engineers and leaders, we’ve found that around 70% of participants used MTTA and MTTR as one of their main metrics. 20% of participants cited looking at planned versus unplanned work, and 10% said they currently look at no metrics. While MTTA and MTTR are good starting points, they're no longer enough. With the rise in complexity, it can be difficult to gain insights into your services’ operational health.

Read Post

Blameless

Read more about Here are the Metrics you Need to Understand Operational Health

Resilience in Action, E5: Tammy Bryant and Eric Roberts The Importance of Glue Work

Aug 14, 2020 By Blameless Community In Blameless

Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Blameless Staff SRE Amy Tobey. Amy has been an SRE and DevOps practitioner since before those names existed. She cares deeply about her community of SREs and wants to take what she’s learned over the 20+ years of her career to help others.

Read Post

Blameless

Read more about Resilience in Action, E5: Tammy Bryant and Eric Roberts The Importance of Glue Work

Choosing the Right SRE Tools

Aug 13, 2020 By Emily Arnott In Blameless

Implementing SRE practices and culture can be challenging. Fortunately, there are a variety of tools for each aspect of SRE: monitoring, SLOs and error budgeting, incident management, incident retrospectives, alerting, chaos engineering, and more. In this blog, we’ll talk about what to look for in an SRE tool, and how they’ll help you on your journey to reliability excellence.

Read Post

Blameless

Read more about Choosing the Right SRE Tools

I Have An SLO. Now What? -Alex Hidalgo

Aug 13, 2020 By Blameless In Blameless

It’s 2020: There is a plethora of data available about measuring SLIs and setting SLO targets. But, now that you have this data, what are you actually supposed to do with it? The classic example of “Ship features when you have error budget; focus on reliability when you don’t.” is antiquated, too simple, and ignores all of the amazing discussions and decisions you can have with your SLO data. Let’s talk about how you can use SLOs to actually make people happier — from your customers, to your engineers, to your business.

View Video

Blameless

Incident Management

Read more about I Have An SLO. Now What? -Alex Hidalgo

Look Upstream to Solve your Team's Reliability Issues

Aug 12, 2020 By Hannah Culver In Blameless

In “Upstream” by Dan Health, we explore a variety of different problems ranging from homelessness, to high school graduation rates, to the state of sidewalks in different neighborhoods within the same city. In each of these examples, Dan discusses how upstream thinking decreased downstream work. Upstream thinking is characterized as proactive, collective actions to improve outcomes rather than reactions after an issue has already occurred.

Read Post

Blameless

Read more about Look Upstream to Solve your Team's Reliability Issues

How SLOs Enable Fast, Reliable Application Delivery

Aug 6, 2020 By Blameless In Blameless

As enterprises adopt DevOps at scale, there is increasing tension between product, operations, and the business to manage competing incentives around release velocity and risk. In this webinar, you’ll learn how adopting a collaborative approach to implementing service level objectives (SLOs) gives software teams and leaders a shared language to focus engineering efforts and optimize the customer experience.

View Video

Blameless

Incident Management

Read more about How SLOs Enable Fast, Reliable Application Delivery

The Importance of Reliability Engineering

Aug 6, 2020 By Emily Arnott In Blameless

If you’ve spent any time in tech circles lately, there are three letters you’ve surely heard: SRE. Site Reliability Engineering is the defining movement in tech today. Giants like Google and Amazon market their ability to provide reliable service and startups are now investing in reliability as an early priority. But what makes reliability engineering so important?

Read Post

Blameless

Read more about The Importance of Reliability Engineering

Improving Postmortems from Chores to Masterclass with Paul Osman

Aug 5, 2020 By Blameless Community In Blameless

In our 2019 Blameless Summit, Paul Osman spoke about how to take postmortems or incident retrospectives to a new level. ‍The following transcript has been lightly edited for clarity. Slides from this talk are available here. Paul Osman: I lead the SRE team at Under Armour. Who here knows about Under Armour as a tech company? Does anybody think about Under Armour as a tech company? Under Armour makes athletic attire, shirts and shoes.

Read Post

Blameless

Read more about Improving Postmortems from Chores to Masterclass with Paul Osman

How to Bring Operational Experience to your Development with Github's Lauren Rubin

Aug 4, 2020 By Blameless Community In Blameless

At the 2019 Blameless Summit, Lauren Rubin spoke about how to bring operational expertise to development teams. The following transcript has been lightly edited for clarity. Lauren Ruben: I was going to ask for a show of hands of how many people here who are on call right this minute right now. I am actually on call right this minute. I like to live dangerously. If my phone beeps, the specific noise that means I have been paged, I'm sorry, I am going to look at it.

Read Post

Blameless

Read more about How to Bring Operational Experience to your Development with Github's Lauren Rubin

Operations | Monitoring | ITSM | DevOps | Cloud

August 2020

Here are the Important Differences Between SLI, SLO, and SLA

August SRE Leaders Panel

How SLOs Enable Fast, Reliable Application Delivery

SREview Issue #4 August 2020

What is a Kubernetes Operator and Why it Matters for SRE

Here are the Metrics you Need to Understand Operational Health

Resilience in Action, E5: Tammy Bryant and Eric Roberts The Importance of Glue Work

Choosing the Right SRE Tools

I Have An SLO. Now What? -Alex Hidalgo

Look Upstream to Solve your Team's Reliability Issues

How SLOs Enable Fast, Reliable Application Delivery

The Importance of Reliability Engineering

Improving Postmortems from Chores to Masterclass with Paul Osman

How to Bring Operational Experience to your Development with Github's Lauren Rubin

Monthly Archive

Follow Us