November 2022

Postmark + Squadcast Integration: Simplifying Alert Routing

Nov 25, 2022 By Vishal Padghan In Squadcast

Postmark is a simple email delivery system used to send transactional and marketing emails and it ensures getting them delivered to the inbox on time, every time. It also helps in reducing email delivery time considerably. If you use Postmark for your email delivery requirements, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from Postmark to the right users in Squadcast. The below steps will help you set up Postmark and Squadcast integration.

Read Post

Squadcast

Read more about Postmark + Squadcast Integration: Simplifying Alert Routing

Canary Deployment Benefits & Implementation Guide

Nov 22, 2022 By Myra Nizami In Blameless

All deployment strategies have pros and cons. Find out whether canary deployment is a good fit for your team by looking at how it works, and its best practices.

Read Post

Blameless

Read more about Canary Deployment Benefits & Implementation Guide

Day in the life of an SRE

Nov 16, 2022 By Emma Stewart-Oram In Civo

We spoke with two members from the SRE team, Alex Blyth and Zulhilmi Zainudin, to learn more about their role at Civo. Through this series, we aim to provide you with an overview of the different roles we have at Civo and what advice our team has. You can discover more about our team in our “day in the life of a Go Dev” and “day in the life of an Intern” blog.

Read Post

Civo

Read more about Day in the life of an SRE

CircleCI + Squadcast Integration: Alert Routing Made Easy

Nov 16, 2022 By Vishal Padghan In Squadcast

CircleCI is a continuous integration and continuous delivery (CI/CD) platform that helps in implementing DevOps practices. It is used to build, test, and deploy projects, by automating pipelines with jobs. If you use CircleCI for implementing your DevOps practices, you can now integrate it with Squadcast to route detailed alerts to the right users in Squadcast. The below steps will help you set up CircleCI and Squadcast integration.

Read Post

Squadcast

Read more about CircleCI + Squadcast Integration: Alert Routing Made Easy

Reducing MTTR for DevOps and SREs with PagerDuty Process Automation and InfluxDB

Nov 15, 2022 By Jason Myers In InfluxData

Mean time to resolution (MTTR) is a metric that transcends industry and technology. It’s a measure of how quickly, on average, support teams identify, act, and resolve IT issues and incidents. Because MTTR directly relates to service quality, maintaining a low MTTR is a critical goal for DevOps and SRE teams. These teams have a vested interest in resolving issues quickly because escalating incidents to higher levels of the support team increases response and resolution times.

Read Post

InfluxData

Read more about Reducing MTTR for DevOps and SREs with PagerDuty Process Automation and InfluxDB

My Most Surprising Discoveries from The SRE Report 2023

Nov 15, 2022 By Leo Vasiliou In Catchpoint

I’ve had the honor and privilege of authoring The SRE Report for the last three years. For the 2023 version, this included working with some amazing individuals like Anna Jones, Kurt Andersen, and Steve McGhee. Download The SRE Report 2023 here (no registration required).

Read Post

Catchpoint

Read more about My Most Surprising Discoveries from The SRE Report 2023

3 tips for flexible, adaptive incident management

Nov 15, 2022 By Aaron Lober In Blameless

Incidents should be your best friend. It sounds like a controversial statement. It sounds like a lot of unnecessary work. The truth is, for companies engaged in delivering any online or digital experience, taking this point of view is absolutely E-S-S-E-N-T-I-A-L.

Read Post

Blameless

Read more about 3 tips for flexible, adaptive incident management

Blameless culture drives incident learning and other key insights from Catchpoint's 2022 SRE Report

Nov 9, 2022 By Emily Arnott In Blameless

SRE is a constantly evolving field, responding to the challenges of increasing reliance on tech and the opportunities of its evolving abilities. Reliability has to remain a step ahead of the cutting edge, whether it’s navigating remote work, implementing AI assistance, or optimizing internal processes. But how do we know that SRE is keeping up? ‍ We’re proud and excited to announce the results of the SRE Survey we ran in partnership with Catchpoint.

Read Post

Blameless

Read more about Blameless culture drives incident learning and other key insights from Catchpoint's 2022 SRE Report

The 2023 SRE Report provides the broadest independent insights into SRE Practices

Nov 8, 2022 By Catchpoint In Catchpoint

Findings from the 5th edition of The SRE Report show that lower TCO, Driving Growth and Retaining Customers are Key Business Drivers for Adopting SRE Practices.

Read Post

Catchpoint

Read more about The 2023 SRE Report provides the broadest independent insights into SRE Practices

Ask a Site Reliability Engineer (SRE)

Nov 8, 2022 By Datadog In Datadog

Site reliability engineering (SRE) can be complicated, and at Datadog, we’ve spent a lot of time thinking about SRE and refining how we implement it. Join Datadog’s Brandon West and Rick Mangi as they provide a brief overview of SRE and its core concepts. This video also contains a Q&A session from the live taping of this panel.

View Video

Datadog

Read more about Ask a Site Reliability Engineer (SRE)

Guide to Service Level Indicators and Setting Service Level Objectives

Nov 8, 2022 By Last9 In Last9

A guide to set practical Service Level Objectives (SLOs) & Service Level Indicators (SLIs) for your Site Reliability Engineering practices.

Read Post

Last9

Read more about Guide to Service Level Indicators and Setting Service Level Objectives

Empower the SREs - Conclusions from The SRE Report 2023

Nov 8, 2022 By Steve McGhee In Catchpoint

Let's be honest, nobody loves surveys. Ok, well I sure don't. But surveys satisfy a huge need in our demand for insights into complex human-computer, sociotechnical systems. It turns out that we've been measuring the computer part pretty well, but the humans – not as easy to keep track of. When Google SRE first defined toil as a metric we wanted to reduce, we spent far too long trying to quantify it numerically based on tooling and insights from computer systems.

Read Post

Catchpoint

Read more about Empower the SREs - Conclusions from The SRE Report 2023

Introducing a more complete logs forwarding experience

Nov 7, 2022 By Prineet Kaur Bhurji In Platform.sh

One of the key attributes of DevOps and SRE engineers is their ability to meticulously observe and monitor all of their applications. A task which can be achieved more efficiently by centralizing all generated logs to a central endpoint. By centralizing logging, engineers can, at any time, have an accurate overview of all events which take place across their applications, from just one place. Storing logs in an external system also allows companies to ensure compliance with many certifications.

Read Post

Platform.sh

Read more about Introducing a more complete logs forwarding experience

For incident management, should you build or buy?

Nov 7, 2022 By Aaron Lober In Blameless

Is your incident response held together by a thread? Are you manually recording incident updates in a shared doc? Do you struggle to juggle the incident management workload with your other responsibilities? Does everyone on-call report data the same way? These are all common problems faced by DevOps teams still relying on homegrown incident management tooling.

Read Post

Blameless

Read more about For incident management, should you build or buy?

Service Level Management Process Explained (with Examples)

Nov 3, 2022 By Myra Nizami In Blameless

‍ Service Level Management, or SLM, is defined as the process of negotiating Service Level Agreements and ensuring that they are met. ‍ Service Level Management is a fundamental part of SRE and DevOps. It encompasses the expectations and perceptions that both the business and the customer have about the service and its performance. Service level management will include existing and new services as they are added, with the service level agreements (SLAs) being modified accordingly.

Read Post

Blameless

Read more about Service Level Management Process Explained (with Examples)

To "SRE" Or Not To "SRE"? That is (No Longer) the Question

Nov 3, 2022 By Ariel Dan In Cloudify

The DevOps world is going through rapid changes and rightfully so. In a world where everything is cloud or cloud-native-based, scale is becoming one of the most critical parameters for business efficiency. In fact, velocity is no longer measured by the number of code lines a developer produces, but rather by the time it takes the team to release a feature. Focusing on a feature release rather than the number of code lines forces businesses to switch to a more sophisticated delivery mechanism.

Read Post