January 2022

DevOps Tools (All of the Tools Your Team Needs)

Jan 27, 2022 By Emily Arnott In Blameless

Wondering about DevOps Tools? We explain the best tools for every step of the DevOps development process. What are DevOps Tools used for? DevOps relies on effective tools to help teams manage the entire software development lifecycle. These tools can automate tasks, monitor applications, and facilitate sharing of information between teams.

Read Post

Blameless

Read more about DevOps Tools (All of the Tools Your Team Needs)

Analyzing SRE Job Postings - From Amazon to Microsoft

Jan 27, 2022 By JP Cheung In Rootly

An analysis of SRE job descriptions from 4 companies highlights what businesses actually expect SREs to do.

Read Post

Rootly

Read more about Analyzing SRE Job Postings - From Amazon to Microsoft

Cloud Technology Adoption Trends

Jan 27, 2022 By Peter Claridge In eG Innovations

In the second half of 2021, eG Innovations partnered with the DevOps Institute to conduct an online survey of more than 900+ individuals from Sys Admin, DevOps, SREs, and other IT backgrounds. We asked questions about: Some of the results included: You can download the full survey results here: Cloud Technology Adoption Trends | eG Innovations If surveys and statistics on technology adoption are of interest, we have some other recent ones available, conducted in the last 12 months,.

Read Post

eG Innovations

Read more about Cloud Technology Adoption Trends

DevOps Methodology | Goals, Principles & Process

Jan 26, 2022 By Emily Arnott In Blameless

Wondering what DevOps Methodology is all about? We will explain what it is, how it works, and the principles and processes that make it successful. What is DevOps Methodology? DevOps methodology is a development process where Development and IT Operations collaborate throughout the lifecycle to facilitate faster deployment of reliable software products.

Read Post

Blameless

Read more about DevOps Methodology | Goals, Principles & Process

Five Ways Developers Can Help SREs

Jan 25, 2022 By Mayank Gupta In Squadcast

Reliability is a team game. More the collaboration between Developers and SREs, greater will be the success of the product. In this blog, we have listed down the five best practices that developers can adopt, to make the SRE's life easier. It is not easy to be a site reliability engineer. Monitoring system infrastructure and aligning them with the key reliability metrics is quite a daunting task. Whereas, a software engineer's job is to deliver high-quality software.

Read Post

Squadcast

Read more about Five Ways Developers Can Help SREs

Introducing CommsFlow for Context-Rich and Timely Updates to All Stakeholders

Jan 25, 2022 By Emily Arnott In Blameless

We’re so excited to announce our latest platform feature, CommsFlow™! This addition to the core Blameless product offering allows teams to keep stakeholders updated as the reliability of services and applications change. With our new automated and customizable communication flows, on-call, engineering, and business teams feel a sense of accomplishment and, of course, stay informed.

Read Post

Blameless

Read more about Introducing CommsFlow for Context-Rich and Timely Updates to All Stakeholders

A Primer on the History and Evolution of Incident Management to Today

Jan 21, 2022 By JJ Tang In Rootly

Many of the concepts SREs take for granted about incident management originated with efforts to fight fires in California in the 1970s.

Read Post

Rootly

Read more about A Primer on the History and Evolution of Incident Management to Today

The Business Case for Observability and Site Reliability Engineering

Jan 20, 2022 By Charles Araujo In Moogsoft

Unlike traditional IT Ops, the role of the SRE isn’t simply focused on finding and solving technical problems. The big win for today’s SREs is supporting the organization’s strategic innovation initiatives. With the appropriate observability capabilities, it’s possible to quantify the value that software infrastructure contributes to this innovation effort.

Read Post

Moogsoft

Read more about The Business Case for Observability and Site Reliability Engineering

Why SRE Benefits Your Organization's Teams & Your Customers

Jan 18, 2022 By Emily Arnott In Blameless

Wondering why you should choose SRE for your organization? We will explain what it is and all the benefits it can bring to your organization. What are the benefits of SRE?

Read Post

Blameless

Read more about Why SRE Benefits Your Organization's Teams & Your Customers

Implementing SRE at the largest online retailer of NL and Belgium w/ Bart Enkelaar (bol.com) | EP #5

Jan 17, 2022 By StackState In StackState

For the fifth episode of the StackPod, we invited Bart Enkelaar. Bart is a lead SRE at the largest online retailing platform in the Netherlands and Belgium: bol.com. He's been a backend engineer for 13 years and is now responsible for setting up site reliability engineering across more than a hundred DevOps teams. In this episode, Bart and Anthony talk about.

View Video

StackState

Read more about Implementing SRE at the largest online retailer of NL and Belgium w/ Bart Enkelaar (bol.com) | EP #5

Top 5 Incidents and Outages of 2021

Jan 14, 2022 By Quentin Rousseau In Rootly

An overview of major IT incidents and outages in 2021

Read Post

Rootly

Read more about Top 5 Incidents and Outages of 2021

Canary Deployments | The Benefits of an Iterative Approach

Jan 13, 2022 By Emily Arnott In Blameless

At Blameless, we want to embrace all the benefits of the SRE best practices we preach. We’re proud to announce that we’ve started using a new system of feature flagging with canaried and iterative rollouts. This is a system where new releases are broken down and flagged based on the features each part of the release implements. Then, an increasing subset of users are given access to an increasing number of features.

Read Post

Blameless

Read more about Canary Deployments | The Benefits of an Iterative Approach

Cloud-Native Development (Everything You Need to Know)

Jan 10, 2022 By Noor-ul-Anam Ruqayya In Blameless

Wondering about Cloud-Native Development? We explain what cloud-native development is and how it can help build fast and reliable applications.

Read Post

Blameless

Read more about Cloud-Native Development (Everything You Need to Know)

The Importance of Observability for the SRE

Jan 10, 2022 By Alex Romine In Broadcom

The term Site Reliability Engineer (SRE) first appeared in Google in the early 2000s. In Google’s 2016 SRE Book, Benjamin Treynor Sloss wrote that, generally speaking, “an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).” This means that the SRE teams at Google decide how a system should run in production as well as how to make it run that way.

Read Post

Broadcom

Read more about The Importance of Observability for the SRE

What Log4j Vulnerability Means for SREs?

Jan 7, 2022 By Weihan Li In Rootly

A summary of the Log4j vulnerability, and key takeaways for SREs.

Read Post

Rootly

Read more about What Log4j Vulnerability Means for SREs?

SRE and the Practice of Practice

Jan 6, 2022 By Matt Davis In Blameless

Part of the trepidation of being on-call is encountering unfamiliar emergency scenarios where we are surprised by suddenly not knowing how to do our jobs. We feel lost and alone, complicated by the world around us, powerless to resolve or even mitigate the problem. On-call need not be a solo affair full of fear and anxiety. There are ways we can employ practice and open collaboration outside of incidents to prepare us better.

Read Post

Blameless

Read more about SRE and the Practice of Practice

The Universal Language: Reliability for Non-Engineering Teams

Jan 5, 2022 By Emily Arnott In Blameless

We talk about reliability a lot from the context of software engineering. We ask questions about service availability, or how important it is for specific users. But when organizations face outages, it becomes immediately obvious that the reliability of an online service or application is something that impacts the entire business with significant costs. A mindset of putting reliability first is a business imperative that all teams should share.

Read Post

Blameless

Read more about The Universal Language: Reliability for Non-Engineering Teams

Building an SRE Team with Specialization

Jan 5, 2022 By Emily Arnott In Blameless

As organizations progress in their reliability journey, they may build a dedicated team of site reliability engineers. This team can be structured in two major ways: a distributed model, where SREs are embedded in each project team, providing guidance and support for that team; and a centralized model, where one team provides infrastructure and processes for the entire organization.

Read Post

Blameless

Read more about Building an SRE Team with Specialization

Squadcast + Amazon EventBridge: Routing Alerts Made Easy

Jan 4, 2022 By Vishal Padghan In Squadcast

Amazon EventBridge is an AWS serverless event bus service making it easier to build event-driven applications. It uses events generated from your applications, integrated Software-as-a-Service (SaaS) applications, and other AWS services. It delivers a stream of real-time data from event sources to target services like AWS Lambda. You can also set up routing rules to determine the destination where you wish to send the data and build decoupled application architectures.

Read Post

Squadcast

Read more about Squadcast + Amazon EventBridge: Routing Alerts Made Easy

Operations | Monitoring | ITSM | DevOps | Cloud

January 2022

DevOps Tools (All of the Tools Your Team Needs)

Analyzing SRE Job Postings - From Amazon to Microsoft

Cloud Technology Adoption Trends

DevOps Methodology | Goals, Principles & Process

Five Ways Developers Can Help SREs

Introducing CommsFlow for Context-Rich and Timely Updates to All Stakeholders

A Primer on the History and Evolution of Incident Management to Today

The Business Case for Observability and Site Reliability Engineering

Why SRE Benefits Your Organization's Teams & Your Customers

Implementing SRE at the largest online retailer of NL and Belgium w/ Bart Enkelaar (bol.com) | EP #5

Top 5 Incidents and Outages of 2021

Canary Deployments | The Benefits of an Iterative Approach

Cloud-Native Development (Everything You Need to Know)

The Importance of Observability for the SRE

What Log4j Vulnerability Means for SREs?

SRE and the Practice of Practice

The Universal Language: Reliability for Non-Engineering Teams

Building an SRE Team with Specialization

Squadcast + Amazon EventBridge: Routing Alerts Made Easy

Monthly Archive

Follow Us