%term

The latest News and Information on Service Reliability Engineering and related technologies.

Analyzing SRE Job Postings - From Amazon to Microsoft

Jan 27, 2022 By JP Cheung In Rootly

An analysis of SRE job descriptions from 4 companies highlights what businesses actually expect SREs to do.

Read Post

Rootly

Read more about Analyzing SRE Job Postings - From Amazon to Microsoft

Cloud Technology Adoption Trends

Jan 27, 2022 By Peter Claridge In eG Innovations

In the second half of 2021, eG Innovations partnered with the DevOps Institute to conduct an online survey of more than 900+ individuals from Sys Admin, DevOps, SREs, and other IT backgrounds. We asked questions about: Some of the results included: You can download the full survey results here: Cloud Technology Adoption Trends | eG Innovations If surveys and statistics on technology adoption are of interest, we have some other recent ones available, conducted in the last 12 months,.

Read Post

eG Innovations

Read more about Cloud Technology Adoption Trends

Five Ways Developers Can Help SREs

Jan 25, 2022 By Mayank Gupta In Squadcast

Reliability is a team game. More the collaboration between Developers and SREs, greater will be the success of the product. In this blog, we have listed down the five best practices that developers can adopt, to make the SRE's life easier. It is not easy to be a site reliability engineer. Monitoring system infrastructure and aligning them with the key reliability metrics is quite a daunting task. Whereas, a software engineer's job is to deliver high-quality software.

Read Post

Squadcast

Read more about Five Ways Developers Can Help SREs

A Primer on the History and Evolution of Incident Management to Today

Jan 21, 2022 By JJ Tang In Rootly

Many of the concepts SREs take for granted about incident management originated with efforts to fight fires in California in the 1970s.

Read Post

Rootly

Read more about A Primer on the History and Evolution of Incident Management to Today

The Business Case for Observability and Site Reliability Engineering

Jan 20, 2022 By Charles Araujo In Moogsoft

Unlike traditional IT Ops, the role of the SRE isn’t simply focused on finding and solving technical problems. The big win for today’s SREs is supporting the organization’s strategic innovation initiatives. With the appropriate observability capabilities, it’s possible to quantify the value that software infrastructure contributes to this innovation effort.

Read Post

Moogsoft

Read more about The Business Case for Observability and Site Reliability Engineering

Implementing SRE at the largest online retailer of NL and Belgium w/ Bart Enkelaar (bol.com) | EP #5

Jan 17, 2022 By StackState In StackState

For the fifth episode of the StackPod, we invited Bart Enkelaar. Bart is a lead SRE at the largest online retailing platform in the Netherlands and Belgium: bol.com. He's been a backend engineer for 13 years and is now responsible for setting up site reliability engineering across more than a hundred DevOps teams. In this episode, Bart and Anthony talk about.

View Video

StackState

Read more about Implementing SRE at the largest online retailer of NL and Belgium w/ Bart Enkelaar (bol.com) | EP #5

Top 5 Incidents and Outages of 2021

Jan 14, 2022 By Quentin Rousseau In Rootly

An overview of major IT incidents and outages in 2021

Read Post

Rootly

Read more about Top 5 Incidents and Outages of 2021

The Importance of Observability for the SRE

Jan 10, 2022 By Alex Romine In Broadcom

The term Site Reliability Engineer (SRE) first appeared in Google in the early 2000s. In Google’s 2016 SRE Book, Benjamin Treynor Sloss wrote that, generally speaking, “an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).” This means that the SRE teams at Google decide how a system should run in production as well as how to make it run that way.

Read Post

Broadcom

Read more about The Importance of Observability for the SRE

What Log4j Vulnerability Means for SREs?

Jan 7, 2022 By Weihan Li In Rootly

A summary of the Log4j vulnerability, and key takeaways for SREs.

Read Post

Rootly

Read more about What Log4j Vulnerability Means for SREs?

Squadcast + Amazon EventBridge: Routing Alerts Made Easy

Jan 4, 2022 By Vishal Padghan In Squadcast

Amazon EventBridge is an AWS serverless event bus service making it easier to build event-driven applications. It uses events generated from your applications, integrated Software-as-a-Service (SaaS) applications, and other AWS services. It delivers a stream of real-time data from event sources to target services like AWS Lambda. You can also set up routing rules to determine the destination where you wish to send the data and build decoupled application architectures.

Read Post