Operations | Monitoring | ITSM | DevOps | Cloud

June 2024

Using AI to understand what sets incident.io apart from the competition

Whenever a new customer joins incident.io, we make notes on what made them chose to buy our product and, if we were in a competitive process, why they chose us over other providers they were evaluating. It’s a lot of messy data and raw notes, but contained within is a veritable treasure trove of customer feedback. Summarising large amounts of data? Sounds like the perfect job for an LLM.

Redefining incident management: the incident way

Gone are the days when incidents were manual to resolve, invisible to customers, and overall viewed with a negative lens. This is part two of the virtual event series as we dive into our fresh take on what incidents should look like, The Incident Way, and hear from customer stories putting these principles into practice.

Managing your resources in Terraform can be literally easy and actually fun

We approached building a Terraform integration with a sense of trepidation. One of the things that motivates us is building features we think people are going to love using, and Terraform integrations are often not that. Terraform integrations have a number of common pitfalls. Building resources by hand is tedious, and requires deep understanding of their specification. Importing and managing existing resources is also often painful.

How Netflix uses incident.io to power their incident management

Scaling incident management processes can present massive challenges for an organization as large and complex as Netflix. And for Netflix, whose brand has become synonymous with dependability, there’s a lot at stake. Since its introduction to a specific set of Netflix teams, incident.io has been organically adopted far and wide across Netflix Engineering, highlighting just how indispensable and impactful the tool has become.

Our simple incident post-mortem template

Clean, clear, and ready to be customized to suit your needs. Google Docs Having a dedicated incident post-mortem is just as important as having a robust incident response plan. The post-mortem is key to understanding exactly what went wrong, why it happened in the first place, and what you can do to avoid it in the future.

Scaling into the unknown: growing your company when there's no clear roadmap ahead

During a recent episode of ⁠The Debrief⁠, we spoke with Jeff Forde, Architect on the Platform Engineering team at Collectors, about building an incident management program at various stages of growth. In that episode, we called it growth from zero to one, one to two, and two to three. But what happens once you’ve scaled beyond three and answers to question you may have become that much harder to find.

Mastering the Sev0

Remind yourself of the worst incident your organization has faced. If you’re lucky it might have been your entire service being offline for a period of time. Less lucky, and perhaps you encountered something affecting the sensitive data your organization is the custodian of. Whilst uncommon, incidents of this severity happen to every organization at some point. This criticality of situation is what many refer to as a Sev0, the most severe of incidents.