%term

The latest News and Information on Service Reliability Engineering and related technologies.

Outages ITOps professionals are thankful to avoid

Dec 6, 2022 By meshIQ In meshIQ

As we settle into the time of year when we reflect on what we're thankful for, we tend to focus on important basics such as health, family and friends. But on a professional level, IT operations (ITOps) practitioners are thankful to avoid disastrous outages that can cause confusion, frustration, lost revenue and damaged reputations. The very last thing ITOps, network operations center (NOC) or site reliability engineering (SRE) teams want while eating their turkey and enjoying time with family is to get paged about an outage. These can be extremely costly - $12,913 per minute, in fact, and up to $1.5 million per hour for larger organizations.

Read Post

meshIQ

Read more about Outages ITOps professionals are thankful to avoid

Toil: Still Plaguing Engineering Teams

Dec 6, 2022 By Damon Edwards In PagerDuty

Our industry has always had localized expressions for work that was necessary but didn’t move the company forward. The SRE movement calls this type of work “toil.” The concept of toil is a unifying force because it provides an impartial framework for identifying — then containing — the work that takes up our time, blocks people from fulfilling their engineering potential, and doesn’t move the company forward.

Read Post

PagerDuty

Read more about Toil: Still Plaguing Engineering Teams

Postmark + Squadcast Integration: Simplifying Alert Routing

Nov 25, 2022 By Vishal Padghan In Squadcast

Postmark is a simple email delivery system used to send transactional and marketing emails and it ensures getting them delivered to the inbox on time, every time. It also helps in reducing email delivery time considerably. If you use Postmark for your email delivery requirements, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from Postmark to the right users in Squadcast. The below steps will help you set up Postmark and Squadcast integration.

Read Post

Squadcast

Read more about Postmark + Squadcast Integration: Simplifying Alert Routing

Day in the life of an SRE

Nov 16, 2022 By Emma Stewart-Oram In Civo

We spoke with two members from the SRE team, Alex Blyth and Zulhilmi Zainudin, to learn more about their role at Civo. Through this series, we aim to provide you with an overview of the different roles we have at Civo and what advice our team has. You can discover more about our team in our “day in the life of a Go Dev” and “day in the life of an Intern” blog.

Read Post

Civo

Read more about Day in the life of an SRE

CircleCI + Squadcast Integration: Alert Routing Made Easy

Nov 16, 2022 By Vishal Padghan In Squadcast

CircleCI is a continuous integration and continuous delivery (CI/CD) platform that helps in implementing DevOps practices. It is used to build, test, and deploy projects, by automating pipelines with jobs. If you use CircleCI for implementing your DevOps practices, you can now integrate it with Squadcast to route detailed alerts to the right users in Squadcast. The below steps will help you set up CircleCI and Squadcast integration.

Read Post

Squadcast

Read more about CircleCI + Squadcast Integration: Alert Routing Made Easy

Reducing MTTR for DevOps and SREs with PagerDuty Process Automation and InfluxDB

Nov 15, 2022 By Jason Myers In InfluxData

Mean time to resolution (MTTR) is a metric that transcends industry and technology. It’s a measure of how quickly, on average, support teams identify, act, and resolve IT issues and incidents. Because MTTR directly relates to service quality, maintaining a low MTTR is a critical goal for DevOps and SRE teams. These teams have a vested interest in resolving issues quickly because escalating incidents to higher levels of the support team increases response and resolution times.

Read Post

InfluxData

Read more about Reducing MTTR for DevOps and SREs with PagerDuty Process Automation and InfluxDB

My Most Surprising Discoveries from The SRE Report 2023

Nov 15, 2022 By Leo Vasiliou In Catchpoint

I’ve had the honor and privilege of authoring The SRE Report for the last three years. For the 2023 version, this included working with some amazing individuals like Anna Jones, Kurt Andersen, and Steve McGhee. Download The SRE Report 2023 here (no registration required).

Read Post

Catchpoint

Read more about My Most Surprising Discoveries from The SRE Report 2023

The 2023 SRE Report provides the broadest independent insights into SRE Practices

Nov 8, 2022 By Catchpoint In Catchpoint

Findings from the 5th edition of The SRE Report show that lower TCO, Driving Growth and Retaining Customers are Key Business Drivers for Adopting SRE Practices.

Read Post

Catchpoint

Read more about The 2023 SRE Report provides the broadest independent insights into SRE Practices

Empower the SREs - Conclusions from The SRE Report 2023

Nov 8, 2022 By Steve McGhee In Catchpoint

Let's be honest, nobody loves surveys. Ok, well I sure don't. But surveys satisfy a huge need in our demand for insights into complex human-computer, sociotechnical systems. It turns out that we've been measuring the computer part pretty well, but the humans – not as easy to keep track of. When Google SRE first defined toil as a metric we wanted to reduce, we spent far too long trying to quantify it numerically based on tooling and insights from computer systems.

Read Post

Catchpoint

Read more about Empower the SREs - Conclusions from The SRE Report 2023

Ask a Site Reliability Engineer (SRE)

Nov 8, 2022 By Datadog In Datadog

Site reliability engineering (SRE) can be complicated, and at Datadog, we’ve spent a lot of time thinking about SRE and refining how we implement it. Join Datadog’s Brandon West and Rick Mangi as they provide a brief overview of SRE and its core concepts. This video also contains a Q&A session from the live taping of this panel.

View Video