October 2023

Building on Chaos Toolkit's Foundation: New Features for Resilience Engineering

Oct 31, 2023 By Reliably In Reliably

On October 26th 2023, we had the pleasure of receiving Manuel Castellin, a seasoned expert in chaos engineering and Terraform, who took us through two real-world examples demonstrating how to overcome the challenges of implementing chaos engineering when your infrastructure isn’t initially prepared for it and securely experiment on production systems. In the second part of the meetup, Sylvain Hellegouarch, Chaos Toolkit lead developer and Reliably CEO, showed a quick demo of how to use Reliably to build your experiments in a less code-centric and more visual way.

View Video

Reliably

DevOps
SRE

Read more about Building on Chaos Toolkit's Foundation: New Features for Resilience Engineering

Challenges with Running Prometheus at Scale

Oct 31, 2023 By Last9 In Last9

Understanding limitations and challenges scaling Prometheus in modern cloud-native environments. Here we delve into long-term retention, downsampling, high availability, and other challenges.

Read Post

Last9

Read more about Challenges with Running Prometheus at Scale

Introducing Squadcast's Global Event Rulesets | Incident Management | Squadcast

Oct 30, 2023 By Squadcast In Squadcast

With video will give you a walkthrough of Squadcast's new feature 'Global Event Rulesets' that helps you simplify alert Routing and boost efficiency Global Event Rulesets enable you to manage alert routing across services and automate actions based on predefined global event rulesets.

View Video

Squadcast

Read more about Introducing Squadcast's Global Event Rulesets | Incident Management | Squadcast

Secret to Flawless Deployments: Real-Time Canary Deployment tracking with Argo CD & Levitate!

Oct 28, 2023 By Last9 In Last9

Most of your outages are probably caused by a change, and having observability around that will make a lot of difference. Dive into this walkthrough, where we showcase tracking Canary deployments in Argo CD, correlating events and metrics seamlessly with Levitate. For Site Reliability Engineers, DevOps engineers, Software Engineers, and Product Managers seeking to elevate their observability and ensure smooth deployments every time.

View Video

Last9

Read more about Secret to Flawless Deployments: Real-Time Canary Deployment tracking with Argo CD & Levitate!

Tips To Never Miss An Incident Notification With Squadcast Escalations Policies

Oct 27, 2023 By Chitra Bisht In Squadcast

Companies implement an Incident Response process to promptly resolve critical issues. Setting up escalation policies to notify engineers is a key step in this process. With traditional escalation policies, alert notifications still get missed which results in higher response times and failure to meet SLAs. So, how can one ensure incident notifications are never missed?

Read Post

Squadcast

Read more about Tips To Never Miss An Incident Notification With Squadcast Escalations Policies

Opsgenie Alternatives: Finding the Right Fit for your Incident Management Teams

Oct 27, 2023 By Chitra Bisht In Squadcast

In the dynamic landscape of modern IT operations and Incident Management, choosing the right tool is paramount to ensuring the resilience of your organization. Opsgenie, a popular Incident Response and Alerting platform, has been a go-to choice for many. However, as businesses grow and requirements evolve, exploring Opsgenie alternatives becomes essential in the quest to find the perfect fit for your unique operational needs. In this blog, we'll embark on a journey to uncover and evaluate some compelling alternatives to Opsgenie, helping you navigate the vast sea of options and make an informed decision that aligns perfectly with your team's workflows and objectives.

Read Post

Squadcast

Read more about Opsgenie Alternatives: Finding the Right Fit for your Incident Management Teams

Webinar: Streamlining Incident Management With Automation and Contextual Awareness

Oct 27, 2023 By Squadcast In Squadcast

In the modern context of distributed teams & complex digital infrastructure, major incidents having a negative impact spanning multiple teams and services can cause a barrage of alerts. While a meticulously designed incident response strategy can aid in restoring order, it's essential to underscore the significance of providing responders with effective tools that offer contextual understanding and facilitate the identification of actionable alerts.

View Video

Squadcast

Read more about Webinar: Streamlining Incident Management With Automation and Contextual Awareness

MSP's As NOC's, Handling Multiple Clients

Oct 26, 2023 By Chitra Bisht In Squadcast

A Managed Service Provider (MSP) should invest in an Incident Management platform to ensure seamless service delivery and customer satisfaction. Such a platform streamlines Incident Response, improves service reliability, and enhances communication among teams. It helps MSPs in reducing Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR) incidents, thereby minimizing downtime and service disruptions.

Read Post

Squadcast

Read more about MSP's As NOC's, Handling Multiple Clients

Elevating Incident Management: Leveraging automation and AI to put reliability on autopilott

Oct 26, 2023 By Blameless In Blameless

If your company operates in a modern digital environment, then there’s a good chance questionable reliability is hurting you competitively. On the other hand, every hour your engineering team spends on operations comes at the expense of developing your product. So, what are you supposed to do?

View Video

Blameless

Read more about Elevating Incident Management: Leveraging automation and AI to put reliability on autopilott

RapidSpike + Squadcast: Routing Alerts Made Easy

Oct 25, 2023 By Vishal Padghan In Squadcast

RapidSpike is a website monitoring solution that focuses on all three key aspects of website health: performance, reliability and security in a single dashboard. If you use RapidSpike for your website monitoring requirements, you can integrate it with Squadcast, an end-to-end Incident Response tool, to route alerts from RapidSpike to the right users in Squadcast with ease.

Read Post

Squadcast

Read more about RapidSpike + Squadcast: Routing Alerts Made Easy

What is a Pull Request and Why You Need Them

Oct 25, 2023 By Anjali Udasi In Zenduty

As an engineer, you're probably familiar with version control systems like Git that let you track changes to your codebase. But are you using one of the most useful features of Git pull requests? If not, you're missing out. Pull requests are one of the best ways to collaborate on projects and create better code. In this article, we'll go over the pull request meaning, why you should be using them, and how to create your own pull requests.📑 What is incident management software?

Read Post

Zenduty

Read more about What is a Pull Request and Why You Need Them

Navigating the SRE Landscape| Better Incidents Podcast Ep. 9

Oct 24, 2023 By FireHydrant In FireHydrant

View Video

FireHydrant

Read more about Navigating the SRE Landscape| Better Incidents Podcast Ep. 9

Why Invest in Tooling? Benefits and Concerns

Oct 24, 2023 By Emily Arnott In Blameless

When looking to invest money in your engineering teams, what gives the best return? Hiring more staff to enable bigger projects and more diversified skill sets? Training engineers to uplevel their ability and productivity? Increasing salaries to retain the best talent? These are all great ideas that should be exercised often. But there’s one other investment worth considering that can offer huge benefits for relatively small amounts of money: tooling.

Read Post

Blameless

Read more about Why Invest in Tooling? Benefits and Concerns

Blue Matador + Squadcast: Alert Routing Simplified

Oct 19, 2023 By Vishal Padghan In Squadcast

Blue Matador is the fastest, easiest way to set up AWS infrastructure monitoring, allowing small teams to fully monitor their cloud operations with no manual setup. If you use Blue Matador for your cloud monitoring requirements, you can integrate it with Squadcast, an end-to-end Incident Response tool, to route alerts from Blue Matador to the right users in Squadcast with ease.

Read Post

Squadcast

Read more about Blue Matador + Squadcast: Alert Routing Simplified

Squadcast Unveils Enhanced Status Pages

Oct 19, 2023 By Squadcast In Squadcast

Big News! Squadcast's Enhanced Status Page(s) are LIVE!

View Video

Squadcast

Read more about Squadcast Unveils Enhanced Status Pages

Behold a brand New Incident Dashboard!

Oct 18, 2023 By Menahi Shayan In Zenduty

The incidents page, the most visited page on Zenduty, has an all-new look and feel! It's been completely redesigned from the ground up to be faster, easier to use, and more visually appealing. The Incidents list now dedicates more space for important information, such as the title, date, priority, and more. The UI is also more polished, shaving off whitespace where unnecessary. The avatars have been redesigned with more pastel shades, resulting in an overall design far more soothing to the eye.

Read Post

Zenduty

Read more about Behold a brand New Incident Dashboard!

Introducing Past Incident Feature | Incident Context and History | Squadcast

Oct 18, 2023 By Squadcast In Squadcast

Introducing Squadcast's Past Incidents feature which helps incident responders by presenting them with past incidents related to the same service. It employs data science techniques to match and display a historical list of similar incidents from the same service you are currently investigating. This aids in expediting issue resolution by offering valuable insights, such as historical context, prior incident details, timing patterns, and past solutions.

View Video

Squadcast

Read more about Introducing Past Incident Feature | Incident Context and History | Squadcast

What is Prometheus Alertmanager?

Oct 16, 2023 By Anjali Udasi In Zenduty

Prometheus Alertmanager is a powerful tool designed to handle various alerts generated by Prometheus. It plays a vital role in the overall monitoring ecosystem, acting as a centralized hub for managing alert notifications. With Prometheus Alertmanager and its robust notification management capabilities, you can efficiently define alert routing and notification policies. This empowers you to take timely actions and mitigate potential issues before they impact your service availability.

Read Post

Zenduty

Read more about What is Prometheus Alertmanager?

Blameless Unveils New Terraform Provider to Elevate Workflow Management at Scale

Oct 12, 2023 By Blameless In Blameless

Leading Incident Management Solution Enhances Control, Automation, And Security Workflow With Terraform's Lightning-Fast Resource.

Read Post

Blameless

Read more about Blameless Unveils New Terraform Provider to Elevate Workflow Management at Scale

G2 Fall Report Positions Squadcast among the leading Incident Management, and IT Alerting Tools

Oct 12, 2023 By Sanjog Sandhu In Squadcast

Squadcast established itself as a Momentum Leader and High Performer across different regions in the Incident Management and IT Alerting tool categories. We have solidified our leadership in the Mid Market segment across various regions, this recognition stems from our dedicated customer base.

Read Post

Squadcast

Read more about G2 Fall Report Positions Squadcast among the leading Incident Management, and IT Alerting Tools

A Detailed Guide to Setting Up Effective On-Call Rotations

Oct 11, 2023 By Chitra Bisht In Squadcast

On-Call Schedules are predefined rotations/shifts assigning team members to be available for incident response at specific times. They are essential for ensuring round-the-clock support, swift issue/incident resolution, and continuous service availability. For a robust On-Call system, proper schedules are essential serving as the backbone of reliable Incident Response, and ensuring your team is well-prepared to address technical challenges effectively.

Read Post

Squadcast

Read more about A Detailed Guide to Setting Up Effective On-Call Rotations

Three Ways to Better Appreciate your SREs and DevOps Engineers

Oct 10, 2023 By Emily Arnott In Blameless

DevOps engineers and Site Reliability Engineers are vitally important to the continued health of your product and business. We all know it’s true, and yet people in these roles often feel underappreciated and undervalued. This sort of work runs into the issue of “when process and infrastructure break, it gets shoved in the spotlight; but when everything works perfectly, no one notices.” ‍

Read Post

Blameless

Read more about Three Ways to Better Appreciate your SREs and DevOps Engineers

Unified Incident Management: Merits of Combined On-Call and Incident Response | Squadcast

Oct 6, 2023 By Squadcast In Squadcast

In this session, we explore the crucial aspects of effective on-call management and incident response in product organizations. Squadcast combines On-Call and Incident Response into a single platform using automation capabilities for enhanced reliability, continuous learning, and better productivity. 🔍 Timestamps.

View Video

Squadcast

Read more about Unified Incident Management: Merits of Combined On-Call and Incident Response | Squadcast

AI is not intellignece: Bill Kennedy - The Reliability Podcast

Oct 6, 2023 By Last9 In Last9

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

View Video

Last9

Read more about AI is not intellignece: Bill Kennedy - The Reliability Podcast

Bugs in NASAs codebase : Bill Kennedy - The Reliability Podcast

Oct 6, 2023 By Last9 In Last9

View Video

Last9

Read more about Bugs in NASAs codebase : Bill Kennedy - The Reliability Podcast

Choosing the Right Career Path in Tech: Software Engineering vs. Site Reliability Engineering (SRE)

Oct 6, 2023 By Anjali Udasi In Zenduty

The tech industry is booming, and there are many different career paths. But, two of the most popular and in-demand roles are Software Engineering and Site Reliability Engineering (SRE). Site Reliability Engineering (SRE) blends elements of software engineering with IT operations, focusing on reliability. On the other hand, SWE Software Engineering involves designing, developing, testing, and deploying software applications.

Read Post

Zenduty

Read more about Choosing the Right Career Path in Tech: Software Engineering vs. Site Reliability Engineering (SRE)

Writing code with empathy: Bill Kennedy - The Reliability Podcast

Oct 5, 2023 By Last9 In Last9

View Video

Last9

Read more about Writing code with empathy: Bill Kennedy - The Reliability Podcast

The job of a backend dev: Build good ACs: Bill Kennedy - The Reliability Podcast

Oct 5, 2023 By Last9 In Last9

View Video

Last9

Read more about The job of a backend dev: Build good ACs: Bill Kennedy - The Reliability Podcast

Losing customers because of bad software: Bill Kennedy - The Reliability Podcast

Oct 5, 2023 By Last9 In Last9

View Video

Last9

Read more about Losing customers because of bad software: Bill Kennedy - The Reliability Podcast

What you do in practice is what you do in a game: Bill Kennedy - The Reliability Podcast

Oct 5, 2023 By Last9 In Last9

View Video

Last9

Read more about What you do in practice is what you do in a game: Bill Kennedy - The Reliability Podcast

Bugs in NASAs codebase and importance of QA in engineering : Bill Kennedy - The Reliability Podcast

Oct 5, 2023 By Last9 In Last9

View Video

Last9

Read more about Bugs in NASAs codebase and importance of QA in engineering : Bill Kennedy - The Reliability Podcast

The Rise of Generative AI

Oct 5, 2023 By Blameless In Blameless

Revolutionizing Business: The Rise of Generative AI - Actionable Strategies to Integrate Advanced AI Seamlessly into Your Engineering Operations.

View Video

Blameless

Read more about The Rise of Generative AI

Global Event Rulesets: Streamlining Alert Routing Across Services

Oct 4, 2023 By Vishal Padghan In Squadcast

In the fast-paced world of organizations handling numerous microservices and projects, tackling the challenges that arise can be a daunting task. As many of our customers come with infrastructures that included a large number of microservices we set out to make it easier for them to streamline alert source management. Enter Global Event Rulesets (GER). This feature is designed to redefine the way you manage alerts.

Read Post

Squadcast

Read more about Global Event Rulesets: Streamlining Alert Routing Across Services

What is Zero Trust Reliability in engineering: Piyush Verma - The Reliability Podcast

Oct 4, 2023 By Last9 In Last9

View Video

Last9

Read more about What is Zero Trust Reliability in engineering: Piyush Verma - The Reliability Podcast

Production vs Local in engineering: Piyush Verma - The Reliability Podcast

Oct 4, 2023 By Last9 In Last9

View Video

Last9

Read more about Production vs Local in engineering: Piyush Verma - The Reliability Podcast

Blameless Introduces The First Generative AI-powered, Automated Incident Communications With Comms Assistant

Oct 3, 2023 By Blameless In Blameless

Revolutionizing Incident Communications, Blameless Introduces Generative AI To More Fully Automate Incident Communication Workflows.

Read Post

Blameless

Read more about Blameless Introduces The First Generative AI-powered, Automated Incident Communications With Comms Assistant

Fostering a fearless engineering culture: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about Fostering a fearless engineering culture: Bill Kennedy - The Reliability Podcast

The mistake boot in engineering: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about The mistake boot in engineering: Bill Kennedy - The Reliability Podcast

What's missing in engineering today?: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about What's missing in engineering today?: Bill Kennedy - The Reliability Podcast

Engineers should have a desire to find bugs: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about Engineers should have a desire to find bugs: Bill Kennedy - The Reliability Podcast

The only industry not licensed to do their job - Engineering: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about The only industry not licensed to do their job - Engineering: Bill Kennedy - The Reliability Podcast

Software is ubiquitous and can change our mood: Piyush Verma - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about Software is ubiquitous and can change our mood: Piyush Verma - The Reliability Podcast

My job is an engineer = build ACs: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about My job is an engineer = build ACs: Bill Kennedy - The Reliability Podcast

The job of a backend dev: Build good ACs - Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about The job of a backend dev: Build good ACs - Bill Kennedy - The Reliability Podcast

The only industry not licensed to do their job - Engineering: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about The only industry not licensed to do their job - Engineering: Bill Kennedy - The Reliability Podcast

A Journey through the Blameless Resource Library

Oct 3, 2023 By Emily Arnott In Blameless

From the very beginning of Blameless, we had two vital missions. First, to offer a solution to what we saw as a mounting crisis of reliability by offering a comprehensive, easy-to-use, reliability platform. Second, to educate the companies facing this crisis on the fundamentals of incident management, cutting-edge best practices, and the cultural values that sustain learning and growth.

Read Post

Blameless

Read more about A Journey through the Blameless Resource Library

The WeWork-ization of software: Piyush Verma - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about The WeWork-ization of software: Piyush Verma - The Reliability Podcast

The most beautiful thing about Kubernetes: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about The most beautiful thing about Kubernetes: Bill Kennedy - The Reliability Podcast

Stop using debuggers, learn a mental model of a codebase: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about Stop using debuggers, learn a mental model of a codebase: Bill Kennedy - The Reliability Podcast

In engineering, DON'T BUILD FAST: Bill Kennedy - The Reliability Podcast

Oct 3, 2023 By Last9 In Last9

View Video

Last9

Read more about In engineering, DON'T BUILD FAST: Bill Kennedy - The Reliability Podcast

Working Effectively With Executives During an Incident

Oct 2, 2023 By Ashley Sawatsky In Rootly

You’re in the incident channel rocking yet another incident. Comms are flowing, resolution is in sight, the team is grinding, and you’re feeling good. Then…

Read Post

Rootly

Read more about Working Effectively With Executives During an Incident

Operations | Monitoring | ITSM | DevOps | Cloud

October 2023