%term

The latest News and Information on Service Reliability Engineering and related technologies.

Getting Started with Site Reliability Engineering

Aug 16, 2021 By Robert Ross In FireHydrant

Site Reliability Engineer (SRE) is one of the fastest growing jobs in tech, with Linkedin reporting 34% growth YoY in 2020 and over 9000 openings in their Emerging Jobs Report. If you’re new to SRE and exploring it as a career path, understand that it can be a challenging but rewarding experience. Here are some quick tips on how you can get started with SRE and jump-start a rewarding career.

Read Post

FireHydrant

Read more about Getting Started with Site Reliability Engineering

How to Improve Upon Google's Four Golden Signals of Monitoring

Aug 13, 2021 By JJ Tang In Rootly

The Four Golden Signals of monitoring and observability get a lot of things right. But they could be even better.

Read Post

Rootly

Read more about How to Improve Upon Google's Four Golden Signals of Monitoring

Incident Management Goes to the Olympics

Aug 5, 2021 By Quentin Rousseau In Rootly

A look at outages and disruptions to the IT systems that power the Olympics, from 1996 to today.

Read Post

Rootly

Read more about Incident Management Goes to the Olympics

Demystifying DevOps and SRE

Aug 4, 2021 By James Samuel In Squadcast

How different are DevOps and SRE? Are they related to each other? In this blog, James Samuel sheds light on the similarities & differences between SRE & DevOps followed by the possible ways to structure an SRE team in your organization. One of the terms that people often find confusing is SRE and DevOps. People often ask, should I hire a DevOps Engineer or a Site Reliability Engineer? What is the difference between SRE and DevOps and which one do I need? In this post, I attempt to shed some light.

Read Post

Squadcast

Read more about Demystifying DevOps and SRE

The Unique Reliability Engineering Requirements of Microservices

Jul 30, 2021 By JJ Tang In Rootly

Although the fundamental concepts of site reliability engineering are the same in any environment, SREs must adapt practices to different technologies, like microservices.

Read Post

Rootly

Read more about The Unique Reliability Engineering Requirements of Microservices

Most frequently asked questions surrounding Google's Cloud Operations Sandbox

Jul 29, 2021 By Nir Sharma In Squadcast

Cloud Operations Sandbox serves as a simulation tool for budding SREs to learn the best practices from Google and apply them to real cloud services. In this blog, we have compiled a list of FAQs surrounding the use of Google's Cloud Operations Sandbox. The Google SRE sandbox provides an easy way to get started with the core skills you need to become a SRE.

Read Post

Squadcast

Read more about Most frequently asked questions surrounding Google's Cloud Operations Sandbox

How to Notify Your Team of Errors: Email vs. Slack vs. PagerDuty

Jul 26, 2021 By LogDNA In Mezmo

Site Reliability Engineering (SRE) and Operations (Ops) teams heavily rely on notifications. We use them to know what’s going on with application workloads and how applications are performing. Notifications are critical to ensuring SREs and Ops teams can resolve errors and reduce downtime. They’re also crucial when monitoring environments — not only when running in production but also during the dev-test or staging phase.

Read Post

Mezmo

Read more about How to Notify Your Team of Errors: Email vs. Slack vs. PagerDuty

When You Do DevSecOps, Don't Forget the SREs

Jul 21, 2021 By Quentin Rousseau In Rootly

It's time to break down the silos separating SREs from security engineers.

Read Post

Rootly

Read more about When You Do DevSecOps, Don't Forget the SREs

SRE's Guide to Chaos & Observability

Jul 20, 2021 By Gremlin In Gremlin

Today’s distributed, cloud-based environments are incredibly complex. Not only does each component depend on many others, but modern systems are also highly dynamic—changing frequently as teams push new code or make updates to infrastructure. Taming this complexity to ensure reliability requires end-to-end observability to understand how components depend on each other. Additionally, proactive Chaos Engineering combined with AI-driven observability lets you uncover “unknown unknowns” that impact how your system will respond to different failure scenarios.

View Video

Gremlin

Read more about SRE's Guide to Chaos & Observability

Upcoming trends in DevOps and SRE

Jul 15, 2021 By Biju Chacko In Squadcast

DevOps and SRE are domains with rapid growth and frequent innovations. With this blog you can explore the latest trends in DevOps, SRE and stay ahead of the curve. The past decade has seen widespread adoption of DevOps methodologies in software development. Unsurprisingly, as the needs of users change, DevOps techniques have evolved as well. In this blog we will look at the trends that are most likely to have a significant impact in the coming years.

Read Post