SRE

The latest News and Information on Service Reliability Engineering and related technologies.

Software Metrics Every SRE Team Should Measure

Aug 31, 2022 By Myra Nizami In Blameless

Software metrics give important insight into the performance of your product, but which ones matter most to SRE teams? How do you decide which metrics to track?

Read Post

Blameless

Read more about Software Metrics Every SRE Team Should Measure

Round Robin Escalation: An Efficient Way to Distribute On-Call Responsibilities

Aug 30, 2022 By Vishal Padghan In Squadcast

Nowadays, organizations address a high volume of incidents everyday. With so much happening, responders can be overwhelmed by the volume of incidents and may end up de-prioritizing certain important incidents. Hence, it is important to have an efficient on-call scheduling and escalation process in place. In this blog, we will explore how Round Robin Escalations can help distribute on-call load and set up efficient on-call schedules. This blog covers the following pointers.

Read Post

Squadcast

Read more about Round Robin Escalation: An Efficient Way to Distribute On-Call Responsibilities

What is reliability engineering?

Aug 30, 2022 By Aimee Pearcy In Reliably

Reliability engineering focuses on the ability of systems to perform as it is intended to and function without failure in a specified environment, for the required time duration. Reliability engineering can be applied across the entire lifecycle of software development. It is designed to increase the dependability of a product by detecting potential reliability issues early in the software development cycle, and correcting causes of failure that do occur.

Read Post

Reliably

Read more about What is reliability engineering?

Are Code Freezes Still Needed?

Aug 30, 2022 By Mbaoma Mary In Reliably

A code freeze means no code can be altered or modified during the frozen time, and developers will not make any additional changes. Developers can only modify the code in the event of critical flaws and to the extent required to correct those vital problems. Primarily developers observe a code freeze during the final phase of software development when the software product has reached the delivery state.

Read Post

Reliably

Read more about Are Code Freezes Still Needed?

The SRE's Quick Guide to Kubectl Logs

Aug 28, 2022 By Eyal Katz In Lightrun

Logs are key to monitoring the performance of your applications. Kubernetes offers a command line tool for interacting with the control plane of a Kubernetes cluster called Kubectl. This tool allows debugging, monitoring, and, most importantly, logging capabilities. There are many great tools for SREs. However, Kubernetes supports Site Reliability Engineering principles through its capacity to standardize the definition, architecture, and orchestration of containerized applications.

Read Post

Lightrun

Read more about The SRE's Quick Guide to Kubectl Logs

SRE vs. DevOps: Differences and Similarities

Aug 26, 2022 By Emiliano Pardo Saguier In InvGate

Organizations scramble to adopt new frameworks and methodologies to make the software more scalable. Plus, they need to do it in a reliable way that doesn’t cause more problems. Enter Site Reliability Engineering (SRE), a set of practices introduced by a Google engineer. But how does it stack up to frameworks like DevOps? DevOps and SRE both enhance the software development and product release cycle.

Read Post

InvGate

Read more about SRE vs. DevOps: Differences and Similarities

Healthchecks + Squadcast Integration: Routing Alerts Made Easy

Aug 26, 2022 By Vishal Padghan In Squadcast

Healthchecks is a cron job monitoring service which listens to HTTP requests and email messages ("pings") from your cron jobs and scheduled tasks ("checks"). It lets you update your job to send an HTTP request to the ping URL every time the job runs. When your job does not ping Healthchecks.io on time, then you will receive an alert! If you use Healthchecks for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Healthchecks to the right users in Squadcast.

Read Post

Squadcast

Read more about Healthchecks + Squadcast Integration: Routing Alerts Made Easy

Introduction to Service Catalog | Service Ownership | Service Classification Squadcast

Aug 26, 2022 By Squadcast In Squadcast

To make service management a breeze, we bring to you our improved Service Catalog. The Service Catalog is designed to improve Service Classification and bring more transparency to Service Ownership within your org. This video explains how a consolidated summary of all active services from a single dashboard can help you better track your service health.

View Video

Squadcast

Read more about Introduction to Service Catalog | Service Ownership | Service Classification Squadcast

What are Runbooks? And why are they needed?

Aug 25, 2022 By Vardhan NS In Squadcast

Imagine being an Ops engineer in a team just struck by tragedy. Alarms start ringing, and incident response is in full force. It may sound like the situation is in control. WRONG! There's panic everywhere. The on-call team is scrambling for the heavenly door to redemption. But, the only thing that doesn't stop - Stakeholder Inquiries. This situation is bad. But it could be worse. Now imagine being a less-experienced Ops engineer in a relatively small on-call team struck by tragedy. If you don't have sufficient guidance, let alone moral support- you're toast.

Read Post

Squadcast

Read more about What are Runbooks? And why are they needed?

Using StatusPage at squadcast | SRE Best practices | Squadcast

Aug 25, 2022 By Squadcast In Squadcast

Let your customers know how your Services are doing, without them having to ask you about it. One of the core principles of SRE is Transparency and Status Pages help you communicate the status of your Services to your customers at all times, as opposed to you getting to know the status of your Services through support tickets logged by your customers.

View Video