December 2022

SLA Vs SLO: Tutorial & Examples

Dec 30, 2022 By Squadcast In Squadcast

Service level agreements (SLA) and service level objectives (SLO) are increasing in popularity because modern applications rely on a complex web of sub-services such as public cloud services and third-party APIs to operate, making service quality measurement an operational necessity for serving a demanding market. This article focuses on the similarities and differences between SLAs and SLOs, explains the intricacies involved in implementing them, presents a case study, and finally recommends industry best practices for implementing them.

Read Post

Squadcast

Read more about SLA Vs SLO: Tutorial & Examples

Looking back at our journey through 2022

Dec 30, 2022 By Squadcast Community In Squadcast

We are on the cusp of breaking into 2023🗓️with a bag full of interesting memories. Before we wrap up this year end's celebrations we'd like to look back and highlight some notable events that took place at Squadcast. ‍ Squadcast has grown leaps and bounds over the 12 months in our journey towards becoming an integrated Reliability Workflow platform. 😎

Read Post

Squadcast

Read more about Looking back at our journey through 2022

15 DevOps and SRE Tools you Should Know About in 2023

Dec 28, 2022 By Eduardo Messuti In Statuspal

With the constantly evolving landscape of technology, professionals in the DevOps and SRE fields need to stay up-to-date and knowledgeable about the tools and practices driving the industry forward. Whether you are just starting your career or have been working in DevOps or SRE for years, this post will provide valuable insights and information on the tools you should be familiar with as we head into 2023.

Read Post

Statuspal

Read more about 15 DevOps and SRE Tools you Should Know About in 2023

Squadcast + Hund Integration: A Simplified Approach for effective Alert Routing

Dec 28, 2022 By Vishal Padghan In Squadcast

Hund is a versatile Service Monitoring & Communication tool. It helps monitor services and keeps your audience informed about any status changes automatically through a status page. If you use Hund for monitoring and management requirements, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from Hund to the right users in Squadcast.

Read Post

Squadcast

Read more about Squadcast + Hund Integration: A Simplified Approach for effective Alert Routing

Getting Amazon GuardDuty alerts via SNS Endpoint

Dec 27, 2022 By Vishal Padghan In Squadcast

Monitoring your infrastructure and safeguarding it against threats is not easy. Setting up the infrastructure, monitoring, collecting and analyzing information for threat detection, is indeed a cumbersome process. This is where a security monitoring service like Amazon GuardDuty can help. In this blog, we will explore Amazon GaurdDuty service and discuss how integrating it with Squadcast can help you route alerts to the right users for quick and efficient incident response.

Read Post

Squadcast

Read more about Getting Amazon GuardDuty alerts via SNS Endpoint

The importance of structured communication in the world of SRE

Dec 27, 2022 By Saurabh Hirani In Last9

How you communicate helps build your 9s. In the world of Site Reliability Engineering, this is crucial. How do you do it?

Read Post

Last9

Read more about The importance of structured communication in the world of SRE

Maximize efficiency with Terraformer: Manage Squadcast resources via IaC

Dec 23, 2022 By Vardhan NS In Squadcast

Ever since Terraform was first launched by HashiCorp, infrastructure teams have been quick to leverage its functionality. Because deploying infrastructure via code became so much easier and error-free. This surely became a great way to deploy new infrastructure with custom configurations, but what about managing cloud infrastructure that is already defined? Can Terraform be used to make changes to them? Or can it be used to deploy the same configurations to new environments?

Read Post

Squadcast

Read more about Maximize efficiency with Terraformer: Manage Squadcast resources via IaC

Assessing Observability Maturity at Danske Bank

Dec 22, 2022 By StackState In StackState

In order to ensure reliability, IT operations teams today require a deeper understanding of systems than monitoring, along, can provide. In this session, you'll hear insights from Danske Bank about how their observability journey started, the obstacles encountered along the way, what they've achieved in observability so far and, finally, how they measure the maturity of their observability practice.

View Video

StackState

Read more about Assessing Observability Maturity at Danske Bank

SRE Best Practices

Dec 16, 2022 By Squadcast In Squadcast

Site Reliability Engineering (SRE) is a practice that emerged at Google because of its need for highly reliable and scalable systems. SRE unifies operations and development teams and implements DevOps principles to ensure system reliability, scalability, and performance. There's plenty of documentation on tactics for adopting automation and implementing infrastructure as code, but practical ops-focused SRE best practices based on real-world experience are harder to find. This article will explore 6 SRE best practices based on feedback from SREs and technical subject matter experts.

Read Post

Squadcast

Read more about SRE Best Practices

Introduction to Kubernetes Imperative Commands

Dec 16, 2022 By Squadcast Community In Squadcast

Kubernetes was born out of the need to make our complex applications highly available, scalable, portable and deployable in small microservices independently. It also extends its capabilities to make adoption of DevOps processes and helps you set up modern Incident Response strategies to enhance the reliability of your applications.

Read Post

Squadcast

Read more about Introduction to Kubernetes Imperative Commands

Creating Routing Rules I Creating Incident Routing Flows I Alert Routing I Event Tags I Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Alert Routing allows you to configure Routing Rules to ensure that alerts are routed to the right responder with the help of event tags attached to them. This video explains how you can utilise Routing rules to create various incident routing flows.

View Video

Squadcast

Read more about Creating Routing Rules I Creating Incident Routing Flows I Alert Routing I Event Tags I Squadcast

Integrating Slack & Squadcast- Trigger, Acknowledge, Resolve & Reassign incidents from Slack channel

Dec 15, 2022 By Squadcast In Squadcast

You can integrate Squadcast and Slack to collaborate efficiently with your team while working on incidents. Squadcast sends a notification to the configured Slack Channel as soon as an incident is triggered.

View Video

Squadcast

Read more about Integrating Slack & Squadcast- Trigger, Acknowledge, Resolve & Reassign incidents from Slack channel

Alert Suppression Rules in Squadcast to prevent Alert fatigue | Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Alert suppression can help you avoid alert fatigue by suppressing notifications for non-actionable alerts. Squadcast will suppress the incidents that match any of the Suppression Rules you create for your Services. These incidents will go into the Suppressed state and you will not get any notifications for them.

View Video

Squadcast

Read more about Alert Suppression Rules in Squadcast to prevent Alert fatigue | Squadcast

Using StatusPage at squadcast | SRE Best practices | Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Let your customers know how your Services are doing, without them having to ask you about it. One of the core principles of SRE is Transparency and Status Pages help you communicate the status of your Services to your customers at all times, as opposed to you getting to know the status of your Services through support tickets logged by your customers.

View Video

Squadcast

Read more about Using StatusPage at squadcast | SRE Best practices | Squadcast

Schedules | On-Call Rotations | Set up On-Call Schedules

Dec 15, 2022 By Squadcast In Squadcast

With Squadcast's schedules, You can choose to create as many on-call schedules to support your current team and system structures much like before. What’s new is that you can customize it to the color you want the schedule to reflect on the calendar.

View Video

Squadcast

Read more about Schedules | On-Call Rotations | Set up On-Call Schedules

Plesk 360 + Squadcast: Alert Routing Made Easy

Dec 15, 2022 By Vishal Padghan In Squadcast

Plesk is a popular web hosting platform that makes it easier for administrators to set up and manage websites. Its offering Plesk 360 empowers users to Monitor & Manage Servers more effectively. With its features like fully integrated site & server monitoring helps users keep track of performance and prevent downtime.

Read Post

Squadcast

Read more about Plesk 360 + Squadcast: Alert Routing Made Easy

A New Era for Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Our new brand refresh conveys trust and simplicity in a playful, energetic way — representing our team and product.

View Video

Squadcast

Read more about A New Era for Squadcast

Tagging & Routing at Squadcast | Incident Management | Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Event Tagging is a rule-based, auto-tagging system with which you can define customized tags based on incident payloads, that get automatically assigned to incidents when they are triggered. Auto-add relevant information like priority, severity or alert type to make incoming incidents context-rich. Route alerts to the right responder(s) based on the tags they carry

View Video

Squadcast

Read more about Tagging & Routing at Squadcast | Incident Management | Squadcast

Escalation Policy I Round Robin & Advanced Escalations I Incident Assignment Strategies I Squadcast

Dec 15, 2022 By Squadcast In Squadcast

An escalation policy is a collection of rules used to define how and when an incident should be escalated. In Squadcast an Incident escalation happens when a responder hands off the task/incident to another member, and this handoff is subject to specific rules. This video explains how to set up Escalation Policies, and Round Robin Incident Assignment Strategy in Squadcast.

View Video

Squadcast

Read more about Escalation Policy I Round Robin & Advanced Escalations I Incident Assignment Strategies I Squadcast

Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Teams using MS Teams can now integrate with Squadcast and easily Acknowledge, Resolve & Reassign incidents using MS Teams. You can configure Squadcast to send a notification to the configured MS Teams channel as soon as an incident is triggered.

View Video

Squadcast

Read more about Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

APImetrics + Squadcast: Routing Alerts Made Easy

Dec 14, 2022 By Vishal Padghan In Squadcast

APImetrics is an API Compliance, Monitoring and Security solution that lets you make and run API calls or sequences of API calls (workflows) from external, remote cloud locations using exactly the same security configurations as a typical end user would use. If you use APImetrics for API calling requirements, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from APImetrics to the right users in Squadcast.

Read Post

Squadcast

Read more about APImetrics + Squadcast: Routing Alerts Made Easy

SRE Maturity Model: How Do You Assess Your Team?

Dec 14, 2022 By Myra Nizami In Blameless

How do you evaluate your SRE team’s progress in implementing SRE? We discuss the key SRE indicators for evaluating your team’s progress in the SRE maturity model. ‍ What is the SRE maturity model? ‍ The SRE maturity model is a way of judging how far you are in implementing SRE principles. It is a method used by teams to understand where they ought to implement more SRE best practices to reach greater SRE maturity.

Read Post

Blameless

Read more about SRE Maturity Model: How Do You Assess Your Team?

Observability Pipelines for an SRE

Dec 14, 2022 By Mezmo In Mezmo

In data management, numerous roles rely on and regularly use observability data. The Site Reliability Engineer is one of these roles. Site Reliability Engineers (SREs) work on the digital frontlines, ensuring performant experiences by using observability data to maintain stability and awareness of software running in various environments across organizations.

Read Post

Mezmo

Read more about Observability Pipelines for an SRE

How to design an effective incident on-call program

Dec 13, 2022 By Blameless In Blameless

If anyone on your team has paged a colleague in the middle of the night, your DevOps team has an incident on-call program. Whether that team member knew who to page, and felt comfortable sending the page, is indicative of your on-call program's effectiveness. Join Thai Wood, founder of Resilience Roundup, and Matt Davis, SRE Advocate at Blameless, to discuss: This webinar was recorded live on December 13, 2022.

View Video

Blameless

Read more about How to design an effective incident on-call program

Season's Freezings: Change Freezes with Rich Lafferty

Dec 13, 2022 By PagerDuty In PagerDuty

PagerDuty Staff SRE Rich Lafferty joins Scott McAllister and Mandi Walls for a session on Change Freezes, why PagerDuty does them, and how we manage change during times when a majority of folks are out of the office.

View Video

PagerDuty

Read more about Season's Freezings: Change Freezes with Rich Lafferty

A New Era for Squadcast

Dec 12, 2022 By Anusuya Kannabiran In Squadcast

Our new brand design conveys trust and simplicity in a playful, energetic way - representing our team and product. Get a behind-the-scenes look at our makeover and what it means to our customers' experiences.

Read Post

Squadcast

Read more about A New Era for Squadcast

Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

Dec 12, 2022 By Squadcast In Squadcast

With Squadcast, you can define and monitor Service Level Objects for your services. SLOs allow you to define and enforce an agreement between two parties regarding the delivery of a given service. A Service Level Objective (SLO) is a reliability target, measured by a Service Level Indicator (SLI), and sometimes serves as a safeguard for a Service Level Agreement (SLA). SLOs represent customer happiness and guide the development team’s velocity.

View Video

Squadcast

Read more about Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

Introduction to Service Catalog | Service Ownership | Service Classification | Squadcast

Dec 12, 2022 By Squadcast In Squadcast

To make service management a breeze, we bring to you our improved Service Catalog. The Service Catalog is designed to improve Service Classification and bring more transparency to Service Ownership within your org. This video explains how a consolidated summary of all active services from a single dashboard can help you better track your service health.

View Video

Squadcast

Read more about Introduction to Service Catalog | Service Ownership | Service Classification | Squadcast

Squadcast Product Demo

Dec 11, 2022 By Squadcast In Squadcast

Squadcast is the Only integrated platform that unites on-call alerting and incident management along with Site Reliability Engineering (SRE) workflows under one hood and, in turn, automates human tasks efficiently.

View Video

Squadcast

Read more about Squadcast Product Demo

Tag You're It: Organized, Configurable Tagging is a Must-do for Great Incident Analytics.

Dec 8, 2022 By Aaron Lober In Blameless

Wouldn’t it be nice to learn which parts of your service see the most incidents, or why one service experiences more Sev1 incidents than the others? It’s not always easy to see the full disruptive impact of an engineering incident. Even harder to see trends across incidents and over time. Developing incident insights that you can use to help guide and shape the way your team designs and operates your product takes time, careful consideration, team engagement and the right tooling.

Read Post

Blameless

Read more about Tag You're It: Organized, Configurable Tagging is a Must-do for Great Incident Analytics.

Swimlane Frameworks and Diagrams for Structured Incident Resolution

Dec 7, 2022 By Blameless Community In Blameless

Orchestrate incident resolution with swimlane software that offers customizable frameworks, unifying your team's diagnostic efforts.

Read Post

Blameless

Read more about Swimlane Frameworks and Diagrams for Structured Incident Resolution

Outages ITOps professionals are thankful to avoid

Dec 6, 2022 By meshIQ In meshIQ

As we settle into the time of year when we reflect on what we're thankful for, we tend to focus on important basics such as health, family and friends. But on a professional level, IT operations (ITOps) practitioners are thankful to avoid disastrous outages that can cause confusion, frustration, lost revenue and damaged reputations. The very last thing ITOps, network operations center (NOC) or site reliability engineering (SRE) teams want while eating their turkey and enjoying time with family is to get paged about an outage. These can be extremely costly - $12,913 per minute, in fact, and up to $1.5 million per hour for larger organizations.

Read Post

meshIQ

Read more about Outages ITOps professionals are thankful to avoid

Toil: Still Plaguing Engineering Teams

Dec 6, 2022 By Damon Edwards In PagerDuty

Our industry has always had localized expressions for work that was necessary but didn’t move the company forward. The SRE movement calls this type of work “toil.” The concept of toil is a unifying force because it provides an impartial framework for identifying — then containing — the work that takes up our time, blocks people from fulfilling their engineering potential, and doesn’t move the company forward.

Read Post

PagerDuty

Read more about Toil: Still Plaguing Engineering Teams

Operations | Monitoring | ITSM | DevOps | Cloud

December 2022

SLA Vs SLO: Tutorial & Examples

Looking back at our journey through 2022

15 DevOps and SRE Tools you Should Know About in 2023

Squadcast + Hund Integration: A Simplified Approach for effective Alert Routing

Getting Amazon GuardDuty alerts via SNS Endpoint

The importance of structured communication in the world of SRE

Maximize efficiency with Terraformer: Manage Squadcast resources via IaC

Assessing Observability Maturity at Danske Bank

SRE Best Practices

Introduction to Kubernetes Imperative Commands

Creating Routing Rules I Creating Incident Routing Flows I Alert Routing I Event Tags I Squadcast

Integrating Slack & Squadcast- Trigger, Acknowledge, Resolve & Reassign incidents from Slack channel

Alert Suppression Rules in Squadcast to prevent Alert fatigue | Squadcast

Using StatusPage at squadcast | SRE Best practices | Squadcast

Schedules | On-Call Rotations | Set up On-Call Schedules

Plesk 360 + Squadcast: Alert Routing Made Easy

A New Era for Squadcast

Tagging & Routing at Squadcast | Incident Management | Squadcast

Escalation Policy I Round Robin & Advanced Escalations I Incident Assignment Strategies I Squadcast

Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

APImetrics + Squadcast: Routing Alerts Made Easy

SRE Maturity Model: How Do You Assess Your Team?

Observability Pipelines for an SRE

How to design an effective incident on-call program

Season's Freezings: Change Freezes with Rich Lafferty

A New Era for Squadcast

Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

Introduction to Service Catalog | Service Ownership | Service Classification | Squadcast

Squadcast Product Demo

Tag You're It: Organized, Configurable Tagging is a Must-do for Great Incident Analytics.

Swimlane Frameworks and Diagrams for Structured Incident Resolution

Outages ITOps professionals are thankful to avoid

Toil: Still Plaguing Engineering Teams

Monthly Archive

Follow Us