Latest Posts

Mastering regulatory compliance with incident.io

Oct 14, 2024 By Chris Evans In Incident.io

The origin of incident.io goes back to our days building Monzo, a UK-based bank, where Stephen, Pete, and I first crossed paths. As a bank, compliance with numerous regulations was, unsurprisingly, a top priority. When it came to incident management—something we were very involved in—this meant that every aspect of reporting, policy adherence, and root cause analysis (or "contributing factors," as we called it) had to be managed consistently and meticulously.

Read Post

Incident.io

Read more about Mastering regulatory compliance with incident.io

What is a SEV1 incident? Understanding critical impact and how to respond

Oct 11, 2024 By Kate Bernacchi-Sass In Incident.io

In the world of incident management, a SEV1 incident is something of lore: you’ve either heard the tales of the critical outages that result in widespread disruption and chaos, or you’ve lived through one (and lived to tell the tale). SEV1 incidents are a game-changer. When one hits—think major outages or critical failures—it can seriously impact a business, leading to lost revenue, unhappy customers, and a whole lot of chaos.

Read Post

Incident.io

Read more about What is a SEV1 incident? Understanding critical impact and how to respond

Why I like discussing actions items in incident reviews

Oct 7, 2024 By Chris Evans In Incident.io

Are incident reviews about learning or tracking actions? This question has sparked recent debate in incident management circles, including in my recent panel at SEV0 and in Lorin Hochstein’s post. Should the goal of an incident review be learning, or should it focus on tracking actionable improvements? When is the right time to discuss actions, and are they picked up just to make us feel better? From my experience, learning from incidents and identifying actions are inseparable.

Read Post

Incident.io

Read more about Why I like discussing actions items in incident reviews

incident.io is best in class for momentum, relationships and enterprise adoption

Oct 1, 2024 By incident.io In Incident.io

Trust doesn’t just happen overnight. For us at incident.io, it’s been a journey—one that’s focused on people just as much as the product. From the start, we knew that building great incident management software wasn’t just about creating features and functionality. It was about building relationships, understanding our users, and truly being there for them when it matters most. Our focus has always been to help teams manage incidents better.

Read Post

Incident.io

Read more about incident.io is best in class for momentum, relationships and enterprise adoption

What does SLO stand for? A complete guide to Service Level Objectives (SLOs)

Sep 12, 2024 By Kate Bernacchi-Sass In Incident.io

The world of tech is full of acronyms. SLOs are one of those that everyone talks about, but maybe not everyone fully gets. Whether you're nodding along in meetings or just hearing “SLO” for the first time, we’ve got you covered. In this post, we’ll break down what Service Level Objectives (SLOs) actually are, why they matter, and how they can help keep your systems (and your sanity) in check.

Read Post

Incident.io

Read more about What does SLO stand for? A complete guide to Service Level Objectives (SLOs)

The ultimate guide to on-call schedules

Sep 12, 2024 By Chris Evans In Incident.io

An Ultimate Guide to on-call schedules? You might think this sounds overly grandiose for what’s essentially putting people into a list and rotating through them. But you’d be flat-out wrong. Getting your on-call setup correct is as real and as important as it gets, and getting things wrong can lead to prolonged incidents, burnt out employees, and damaged company reputation.

Read Post

Incident.io

Read more about The ultimate guide to on-call schedules

Data quality testing

Sep 4, 2024 By Lambert Le Manh In Incident.io

Data quality testing is a subset of data observability. It is the process of evaluating data to ensure it meets the necessary standards of accuracy, consistency, completeness, and reliability before it is used in business operations or analytics. This involves validating data against predefined rules and criteria, such as checking for duplicates, verifying data formats, ensuring data integrity across systems, and confirming that all required fields are populated.

Read Post

Incident.io

Read more about Data quality testing

A new era for Catalog

Aug 28, 2024 By Charlie Kingston In Incident.io

Last year, we released Catalog—the connected map of everything in your organization. Catalog was built with the aim of tackling one of the most painful parts of incident response: contextualizing problems and understanding their place within your organization.

Read Post

Incident.io

Read more about A new era for Catalog

Building On-call: Our observability strategy

Aug 22, 2024 By Martha Lambert In Incident.io

At incident.io, we run an on-call product. Our customers need to be sure that when their systems go wrong, we’ll tell them about it—high availability is a core requirement for us. To achieve the level of reliability that’s essential to our customers, excellent observability (o11y) is one of the most important tools in our belt. When done right, observability improves your product experience from two angles.

Read Post

Incident.io

Read more about Building On-call: Our observability strategy

Introducing: incident.io for Microsoft Teams

Aug 13, 2024 By Ed Dean In Incident.io

There’s a major outage. Support tickets are mounting. Everybody from engineering to legal is scrambling for information. You have more Teams notifications clamouring for attention than you do minutes to address them, and it’s hard to know where to begin. What comes next is a balancing act—mitigating the impact, updating colleagues, managing action items, or updating a status page that will be seen by millions.

Read Post

Incident.io

Read more about Introducing: incident.io for Microsoft Teams

Operations | Monitoring | ITSM | DevOps | Cloud

Mastering regulatory compliance with incident.io

What is a SEV1 incident? Understanding critical impact and how to respond

Why I like discussing actions items in incident reviews

incident.io is best in class for momentum, relationships and enterprise adoption

What does SLO stand for? A complete guide to Service Level Objectives (SLOs)

The ultimate guide to on-call schedules

Data quality testing

A new era for Catalog

Building On-call: Our observability strategy

Introducing: incident.io for Microsoft Teams

Monthly Archive

Follow Us