Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

G2 Winter Report 2023: Squadcast Maintains Leadership in IT Alerting and Incident Management

2023 has been a year of significant growth for Squadcast, with an expanding presence in both Mid-Market and Enterprise segments across IT Alerting and Incident Management categories. And with the release of the G2 Winter Report '23, it's an opportune moment to share some of our key achievements.

A Little Resilience Goes A Long Way

‍ Let’s call this the mother of all understatements. If you’re reading this blog, there’s a good chance that you: ‍ a.) Agree wholeheartedly with this sentiment and think it should go without saying, AND… b.) Are surrounded by folks who pay lip service to this idea while not taking it as seriously as they should.

How To View Previous Incidents To Gain Helpful Context During Incident Triage?

Picture this: you're knee-deep in resolving a P1/P0 incident, urgently seeking answers. What if you could tap into past incidents to get important incident insights and streamline your troubleshooting process? In this blog, we pitch into the practical aspects of leveraging Squadcast's Past Incidents feature to help you enhance your Incident Management process.

SRE Essentials: Building a Team and Culture

What differentiates tech companies that weather digital storms with unwavering resilience? In many cases, the answer lies in a deeply ingrained SRE culture, which fosters proactive approaches to system reliability. Site Reliability Engineering (SRE) culture extends beyond mere tech tools and automated scripts. It emphasizes proactive care, shared responsibility, and continuous improvement, leveraging incident management software as a vital component in fostering these core values of SRE.

Lessons in Incident Response I Learned While Waiting Tables

Before I stumbled into the tech industry (a story for another day), I spent several years in the customer service world as a server and front-of-house manager in restaurants. It was in these jobs that I first honed some critical skills that would later lead me on the path to incident response.

Incident vs Bug: Understanding the Key Differences

Incidents and bugs are two common occurrences that can disrupt the smooth operation of systems and applications. While these terms may seem similar, they represent distinct concepts with different implications. Understanding the nuances between incidents and bugs is crucial for effective incident management and proactive problem resolution.

Comparing Uptime Monitoring, Heartbeat Monitoring, and Synthetic Monitoring

In the quest for a high-velocity development environment, one fundamental question looms large: "How can you ensure an exceptional end-user experience when an array of engineers continually push and deploy code?" The unequivocal answer to this pivotal inquiry lies in the establishment of robust, straightforward, and well-defined monitoring practices.