SRE

The latest News and Information on Service Reliability Engineering and related technologies.

7 DevOps Principles Every Team Needs to Practice

Dec 9, 2021 By Myra Nizami In Blameless

If you are optimizing your current DevOps processes, we can help. We’ll explain the 7 key principles of DevOps and how to put them into practice.

Read Post

Blameless

Read more about 7 DevOps Principles Every Team Needs to Practice

SRE Incident Management: Overview, Techniques, and Tools

Dec 8, 2021 By Jacob Hall In Dotcom-Monitor

In the world of a site reliability engineer (SRE), failure is not only an option, but also expected. Systems, web applications, servers, devices, etc., are all prone to performance issues and unexpected outages at some point. It is an unavoidable fact. These unexpected failures can lead to huge revenue losses, customer trust and depending on the industry, maybe fines. Fortunately, SRE incident management is one of the core practices used to limit the disruption caused by unexpected issues.

Read Post

Dotcom-Monitor

Read more about SRE Incident Management: Overview, Techniques, and Tools

Who Needs Site Reliability Engineers (SREs)?

Dec 3, 2021 By JJ Tang In Rootly

Although every company can benefit from SREs, some need SREs more than others.

Read Post

Rootly

Read more about Who Needs Site Reliability Engineers (SREs)?

What can SREs do to make holiday season's peak traffic less chaotic?

Dec 3, 2021 By Vardhan NS In Squadcast

Holiday season's peak traffic is the most challenging period for SREs and on-call engineers. In this blog, we have highlighted the things that SREs can do to make the holiday season less chaotic. The recently concluded Black Friday weekend could have potentially been the most challenging shift for on-call engineers working in the Retail or E-Commerce sector. Since such peak-traffic events push the system to the limits, engineering teams are engulfed in a lot of tension preparing for it.

Read Post

Squadcast

Read more about What can SREs do to make holiday season's peak traffic less chaotic?

DevOps Workflow | A Complete Guide & Best Practices

Dec 2, 2021 By Myra Nizami In Blameless

Curious about DevOps Workflow? We explain the DevOps process, how automation relates to workflow, and best practices for workflow design DevOps is a methodology that involves Development and Operations working together during the development process. Workflow is the sequence in which tasks occur. DevOps workflow relies heavily on automation and involves: Using DevOps, teams can increase collaboration and improve processes to create more stable and manageable processes.

Read Post

Blameless

Read more about DevOps Workflow | A Complete Guide & Best Practices

Site Reliability Engineering, Observability, and the Tradeoffs of Modern Software

Dec 2, 2021 By Jason Bloomberg In Moogsoft

This blog post defines SRE by explaining SLOs and error budgets, highlighting the innovation vs. reliability tradeoff.

Read Post

Moogsoft

Read more about Site Reliability Engineering, Observability, and the Tradeoffs of Modern Software

MTTR | Mean Time to Recovery Explained

Nov 30, 2021 By Noor-ul-Anam Ruqayya In Blameless

Curious about MTTR? We explain what the mean time to recovery is, why it matters to your development team, and how to reduce it.

Read Post

Blameless

Read more about MTTR | Mean Time to Recovery Explained

6 Steps SREs Should Take to Prepare for Black Friday and Cyber Monday 2021

Nov 24, 2021 By Quentin Rousseau In Rootly

Six tips on how Site Reliability Engineers (SREs) can prepare for the reliability challenges of Black Friday and Cyber Monday 2021

Read Post

Rootly

Read more about 6 Steps SREs Should Take to Prepare for Black Friday and Cyber Monday 2021

How Sabre is using SRE to lead a successful digital transformation

Nov 22, 2021 By Kenny Kon In Google Operations

Editor’s note: Today we hear from Kenny Kon, an SRE Director at Sabre. Kenny shares about how they have been able to successfully adopt Google’s SRE framework by leveraging their partnership with Google Cloud. As a leader in the travel industry, Sabre Corporation is driving innovation in the global travel industry and developing solutions that help airlines, hotels, and travel agencies transform the traveler experience and satisfy the ever-evolving needs of its customers.

Read Post

Google Operations

Read more about How Sabre is using SRE to lead a successful digital transformation

SLA vs. SLO (Differences Explained)

Nov 22, 2021 By Emily Arnott In Blameless

Wondering about SLAs and SLOs? We explain service level agreements and service level objectives, their differences, and the importance of each. What are the major differences between service level agreements (SLAs) and service level objectives? An SLA is a legal agreement between the business and the customer that includes a reliability target and the consequences of failing to meet it. An SLO is an internal target that measures how customers use the service.

Read Post