Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

How to Set up SLOs and Configure SLIs in Squadcast | Tracking Error Budget & Burn Rates | Squadcast

This video will help you define and monitor Service Level Objects for your services and also set up and track error budget burn rates in Squadcast. A Service Level Objective (SLO) is a reliability target, measured by a Service Level Indicator (SLI), and sometimes serves as a safeguard for a Service Level Agreement (SLA). SLOs represent customer happiness and guide the development team’s velocity.

Suppression Rules in Squadcast | Minimise Alert fatigue | Suppress Non-Actionable Alerts | Squadcast

This video talks about Alert suppression in Squadcast. Alert Suppression helps you avoid alert fatigue by suppressing notifications for non-actionable alerts. Squadcast will suppress the incidents that match any of the Suppression Rules you create for your Services. These incidents will go into the Suppressed state and you will not get any notifications for them.

SRE Report 2023: Findings From the Field - Toil

Toil. Few other words have the same visceral impact for SREs as their four-letter nemesis: toil. Although pretty much everyone recognizes and agrees that toil is bad, it is a term that is frequently misused in colloquial use. In common English usage, toil is defined as “long strenuous fatiguing labor”. As a term of art in the SRE profession, “toil” has several very specific characteristics which distinguish it from other sorts of work which people spend time on.

Using Tagging and Routing Rules in Squadcast I Incident Classification I Event Tagging I Squadcast

Event Tagging is a rule-based, auto-tagging system with which you can define customized tags based on incident payloads, that get automatically assigned to incidents when they are triggered. This video explains how to create Tagging rules for efficient Incident Classification.

Announcing our improved Schedules & On-Call Rotations

Hey folks! We are super excited to announce that our schedules feature has gone through a bit of an update. Well, more than a bit 🙂. We’ve gone through the feature with a fine-toothed comb and introduced a bunch of UI and functional improvements which we hope will help you achieve one thing: set up, edit and manage your on-call schedules at scale in a matter of minutes (Yes, that was three things but it was tough to condense it to ONE thing)

Adding Incident Watchers in Squadcast | Incident Notifications and Updates | Squadcast

This video talks about Squadcast's Incident Watchers Feature. In Squadcast, any user/stakeholder can subscribe to an Incident and act as a Watcher for an incident. Incident Watchers can choose to receive notifications for all the updates of an incident. This allows any user/stakeholder to act as an observer of the incident, even if they are not active responders. You can customize your watch options for the incident and receive notifications only for those updates.

SRE Vs. DevOps: A Simple Breakdown Of The Differences

You know this already. Regardless of your size, you must keep up with technological developments in your industry — and, increasingly, in other industries, even those that seem unrelated. Embracing disruption can enable you to increase your market share, revenue, and profit margins. Delegating some development and operations responsibilities to Site Reliability Engineering (SRE) experts allows developers to innovate and create new solutions faster.

Webinar Recap: How Observability Impacts SRE, Development, and Security Teams

In today’s fast paced and constantly evolving digital landscape, observability has become a critical component of effective software development. Companies are relying more on and using machine and telemetry data to fix customer problems, refine software and applications, and enhance security. However, while more data has empowered teams with more insights, the value derived from that data isn’t keeping pace with this growth. So how can these teams derive more value from telemetry data?

Analytics in Squadcast | Visualize Team and Organization Level Analytics | MTTA MTTR | Squadcast

Analyzing incident data plays a key role to do better SRE. Squadcast's Analytics Dashboard helps you analyze the performance of your Organization/ Team, for a given time period. It also gives you more insight into past outages that affected your systems.
Sponsored Post

What are Network Operation Centers (NOC) and how do NOC teams work?

Modern-day markets are highly competitive and in order to foster stronger customer relations, we see businesses striving hard to be always available and operational. Hence, businesses invest heavily to ensure higher uptime and to have dedicated teams that constantly monitor the performance of an organization's IT resources. In this blog, we will explore what NOC teams are and why they are important.