Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Reflecting on a momentous 2023 at incident.io

2023 at incident.io was a year to remember. While it's easy to be cyclical about proclaiming that every year was better than the last, a few things stand out that made 2023 truly a year for the books. TL;DR, a lot happened! Especially when you consider that a lot of things didn't make the list above. So as we turn the page to 2024, we wanted to take a moment to reflect on the transformative year that was 2023, not only for us but our customers as well.

The Debrief: Incident management for data teams

If you're on a data team, have you ever considered using an incident management tool to respond to pipeline issues? If the answer is no, then you might want to check out this episode. Here, we chat with Jack, Data Analyst at incident.io, to better understand why data teams can—and should—look to incident management tools like incident.io to manage issues. We chat about.

The Debrief: A year in review-2023 at incident.io

What a year 2023 was at incident.io! While it's hard to summarize 365 days, a few things stand out: So as we close the curtain on 2023, we sat down with the three co-founders of incident.io to do a bit of reflection on the wild ride that was this year. In this episode you'll hear them discuss challenges, big wins, moments of growth, what's next for us, and most importantly, what the three co-founders like most about one another.

How To View Previous Incidents To Gain Helpful Context During Incident Triage?

Picture this: you're knee-deep in resolving a P1/P0 incident, urgently seeking answers. What if you could tap into past incidents to get important incident insights and streamline your troubleshooting process? In this blog, we pitch into the practical aspects of leveraging Squadcast's Past Incidents feature to help you enhance your Incident Management process.

Setting the foundations for on-call that's fair, balanced, and human-focused

Whenever you're providing a service to businesses or individuals that they rely on, it's important to make sure that it's up and running as much as possible without disruptions. But the reality is that, despite your best efforts, downtime does happen. Regardless of when incidents strike, whether it’s 2 PM in the middle of the working day or 2 AM, it's important to have people available to diagnose and resolve issues as soon as possible.

SRE Essentials: Building a Team and Culture

What differentiates tech companies that weather digital storms with unwavering resilience? In many cases, the answer lies in a deeply ingrained SRE culture, which fosters proactive approaches to system reliability. Site Reliability Engineering (SRE) culture extends beyond mere tech tools and automated scripts. It emphasizes proactive care, shared responsibility, and continuous improvement, leveraging incident management software as a vital component in fostering these core values of SRE.

Tracking developer build times to decide if the M3 MacBook is worth upgrading

All incident.io developers are given a MacBook which they use for their development work. That meant when Apple released the M3 MacBook Pros in October, people naturally started asking questions like “wow, how much more productive might I be if my laptop looked that good?” and “perhaps we’d be more secure if our machines were Space Black 🤔” Pete’s (our CTO) response to this was “if you can prove it’s worthwhile, we’ll do it”