SRE Culture: How to Put Reliability First

Unreliable services can affect businesses in myriad ways, from slowed development velocity, to unhappy users, to impacted revenue streams. Reliability often takes a backseat to feature releases and other business initiatives that drive development requirements. This post will discuss key elements of SRE practice that you can use to instill a reliability-first culture in your organization, while also meeting business requirements and keeping your users happy.


SRE as Organizational Transformation: Lessons from Activist Organizers

In the software industry’s recent past, the biggest disruptive wave was Agile methodologies. While Site Reliability Engineering is still early in its adoption, those of us who experienced the disruptive transformation of Agile see the writing on the wall: SRE will impact everyone. Any kind of major transformation like this requires a change in culture, which is a catch-all term for changing people’s principles and behaviors.

Sponsored Post

Top 5 Benefits of a Site Reliability Platform

One of the most important aspects of a software system is its reliability - and for good reason. With so many digital options available in every industry, customers have little reason to continue utilizing applications or services that experience frequent issues with quality or availability. Therefore, it's critical that organizations invest in the processes and tools that are necessary to ensure system reliability. Utilizing reliability platforms is one way to increase or maintain the quality and reliability of an application. Keep reading for an overview of the functionality provided by reliability platforms and the specific ways in which such platforms provide value to the business.


Top 5 Tools for the Best SRE Stack

Site Reliability Engineering (SRE) can mean different things to different companies; and operators that are responsible for reliability typically use a DevOps toolset. However, one thing is certain: SREs combine the skills of software engineers and production and operations management, to achieve high reliability and ensure that SLO/SLA targets are met. So SREs not only need to get a firm grip on the technologies involved in the system, but also on the intricacies of production deployments.


SRE2AUX: How Flight Controllers were the first SREs

In the beginning, there were flight controllers. These were a strange breed. In the early days of the US Manned Space Program, most american households, regardless of class or race, knew the names of the astronauts. John Glen, Alan Shepard, Neil Armstrong. The manned space program was a unifying force of national pride. But no-one knew the names of the anonymous men and later, women, who got the astronauts to orbit, to the moon, and most importantly, got them back to earth.


With SRE, failing to plan is planning to fail

People sometimes think that implementing Site Reliability Engineering (or DevOps for that matter) will magically make everything better. Just sprinkle a little bit of SRE fairy dust on your organization and your services will be more reliable, more profitable, and your IT, product and engineering teams will be happy. It’s easy to see why people think this way. Some of the world’s most reliable and scalable services run with the help of an SRE team, Google being the prime example.


Playbooks-as-Code: An Overview

You’ve heard of playbooks. But what about playbooks-as-code? How can playbooks be managed as code, and what does that mean for SREs and incident response teams? The short answer is that by automating the processes that are defined in conventional playbooks, playbooks-as-code take incident response and reliability engineering to the next level. For the longer answer, keep reading. This article offers an overview of playbooks-as-code, including how they work and the benefits they offer.


Overview of Incident Lifecycle in SRE

Incidents that disrupt services are unavoidable. But every breakdown is an opportunity to learn & improve. Our latest blog is a deep dive into best practices to follow across the lifecycle of an incident, helping teams build a sustainable and reliable product - the SRE way As the saying goes, “Every problem we face is a blessing in disguise”.


QA Engineers, This is How SRE will Transform your Role

When implementing SRE, almost every role within your IT organization will change. One of the biggest transformations will be in your Quality Assurance teams. A common misconception is that SRE “replaces” QA. People believe SLOs and other SRE best practices render the traditional role of QA engineering obsolete, as testing and quality shift left in the SDLC. This leads to QA teams resisting SRE adoption.