Latest Posts

SREview Issue #5 September 2020

Sep 15, 2020 By Blameless Community In Blameless

Here’s the September issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Read Post

Blameless

Read more about SREview Issue #5 September 2020

SRE Leaders Panel: Testing in Production

Sep 11, 2020 By Blameless Community In Blameless

Blameless recently had the privilege of hosting some fantastic leaders in the SRE and resilience community for a panel discussion. Our panelists discussed testing in production, how feature flagging and testing can help us do that, and how to get managers to be on board with testing in production. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Read Post

Blameless

Read more about SRE Leaders Panel: Testing in Production

How to Improve the Reliability of a System

Sep 8, 2020 By Emily Arnott In Blameless

Site reliability engineering is a multifaceted movement that combines many practices, mentalities, and cultural values. It looks holistically at how an organization can become more resilient, operating on every level from server hardware to team morale. At each level, SRE is applied to improve the reliability of relevant systems. With such wide-reaching impact, it can be helpful to take time to reevaluate how to improve the reliability of a system.

Read Post

Blameless

Read more about How to Improve the Reliability of a System

Industry Experts Explain how to Thrive in a Post-COVID World

Sep 3, 2020 By Blameless Community In Blameless

With complex architectures, gaining visibility into systems is becoming more difficult. Additionally, with the move to remote work, it’s more important than ever before to adapt to new modes of work such as asynchronous collaboration. So how do we adjust to these changing times? In a CIO panel hosted by Lightspeed Venture Partners, industry experts came together to discuss these questions. Below are key insights from their conversation.

Read Post

Blameless

Read more about Industry Experts Explain how to Thrive in a Post-COVID World

Determining Error Budgets and Policies that Work for Your Team

Sep 2, 2020 By Hannah Culver In Blameless

SLOs are key pillars in organizations’ reliability journeys. But, once you’ve set your SLOs, you need to know what to do with them. If they’re only metrics that you’re paged for once in a blue moon, they’ll become obsolete. To make sure your SLOs stay relevant, determine error budgets and policies for your teams. In this blog, we’ll look at the basics of error budgeting, how to set corresponding policies, and how to operationalize SLOs for the long term.

Read Post

Blameless

Read more about Determining Error Budgets and Policies that Work for Your Team

How to Build Your SRE Team

Sep 1, 2020 By Emily Arnott In Blameless

As you implement SRE practices and culture at your organization, you’ll realize everyone has a part to play. From engineers setting SLOs, to management upholding the virtue of blamelessness, to marketing teams conducting retrospectives on email campaigns, there’s no part of an organization that doesn’t benefit from the SRE mentality.

Read Post

Blameless

Read more about How to Build Your SRE Team

Here are the Important Differences Between SLI, SLO, and SLA

Aug 26, 2020 By Hannah Culver In Blameless

When embarking on your SRE journey, it can seem daunting to decipher all the acronyms. What are SLOs versus SLAs? What’s the difference between SLIs and SLOs? In this blog post, we’ll cover what SLI, SLO, and SLA mean and how they contribute to your reliability goals.

Read Post

Blameless

Read more about Here are the Important Differences Between SLI, SLO, and SLA

How SLOs Enable Fast, Reliable Application Delivery

Aug 25, 2020 By Blameless Community In Blameless

Application delivery is getting harder each day with the rise in complexity, the demand for services to be always-available, and the increasing pressure on teams to innovate. Service level objectives, or SLOs, can help. In this blog, we’ll discuss how SLOs are the key to modern application delivery, how to manage and measure them, the importance of observability for your SLO solution, and how to begin the journey to reliable application delivery today.

Read Post

Blameless

Read more about How SLOs Enable Fast, Reliable Application Delivery

SREview Issue #4 August 2020

Aug 21, 2020 By Blameless Community In Blameless

Here’s the August issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Read Post

Blameless

Read more about SREview Issue #4 August 2020

What is a Kubernetes Operator and Why it Matters for SRE

Aug 20, 2020 By Emily Arnott In Blameless

Kubernetes is an open-source project that “containerizes” workloads and services and manages deployment and configurations. Released by Google in 2015, Kubernetes is now maintained by the Cloud Native Computing Foundation. Since its release, it has become a worldwide phenomenon. The majority of cloud native companies use it, SaaS vendors offer commercial prebuilt versions, and there’s even an annual convention!

Read Post

Blameless

Read more about What is a Kubernetes Operator and Why it Matters for SRE

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

SREview Issue #5 September 2020

SRE Leaders Panel: Testing in Production

How to Improve the Reliability of a System

Industry Experts Explain how to Thrive in a Post-COVID World

Determining Error Budgets and Policies that Work for Your Team

How to Build Your SRE Team

Here are the Important Differences Between SLI, SLO, and SLA

How SLOs Enable Fast, Reliable Application Delivery

SREview Issue #4 August 2020

What is a Kubernetes Operator and Why it Matters for SRE

Monthly Archive

Follow Us