Latest Posts

Organizing ownership: How we assign errors in our monolith

Nov 18, 2024 By Martha Lambert In Incident.io

At incident.io, we run on a monolith. This brings a whole load of benefits that we don’t want to give up any time soon. We don’t have to worry about the speed of internal network requests, complex deployments, or optimizing work that touches multiple services. This blog post isn’t about the relative benefits of monoliths though (but we’ve written more about that here if you are interested)! Ownership in monoliths is tricky.

Read Post

Incident.io

Read more about Organizing ownership: How we assign errors in our monolith

How we handle sensitive data in BigQuery

Nov 14, 2024 By Lambert Le Manh In Incident.io

As a provider of incident management software, we at incident.io manage sensitive data regarding our customers. This includes Personally Identifiable Information (PII) about their employees, such as emails, first names, and last names, as well as confidential details regarding customer incidents, such as names and summaries. Consequently, we approach the management of this data with a great deal of care.

Read Post

Incident.io

Read more about How we handle sensitive data in BigQuery

How we model our data warehouse

Nov 8, 2024 By Jack Colsey In Incident.io

We've written several times about our data stack here incident, but never about our underlying data warehouse and the design principles behind it. This blog post will run through the high-level structure of our data warehouse and then will go in-depth into the underlying layers.

Read Post

Incident.io

Read more about How we model our data warehouse

Lessons from 4 years of weekly changelogs

Nov 7, 2024 By Pete Hamilton In Incident.io

Writing a meaningful update for customers every week has been held sacred at incident.io since we started the company. We've written over 200 of them in the past 4 years, and we recently celebrated going 2 years straight without missing a single a single week The numbers themselves are not the goal, but the consistency of this habit and what it represents for our customers and our team is very real, and special to me.

Read Post

Incident.io

Read more about Lessons from 4 years of weekly changelogs

Observability as a superpower

Nov 4, 2024 By Sam Starling In Incident.io

With every job I have, I come across a new observability tool that I can’t live without. It’s also something that’s a superpower for us at incident.io: we often detect bugs faster than our customers can report them to us. A couple of jobs ago, that was Prometheus. In my previous job, it was the fact that we retained all of our logs for 30 days, and had them available to search using the Elastic stack (back then, the ELK stack: Elasticsearch, Logstash, and Kibana).

Read Post

Incident.io

Read more about Observability as a superpower

Choosing the right Postgres indexes

Oct 21, 2024 By Milly Leadley In Incident.io

Indexes can make a world of difference to performance in Postgres, but it’s not always obvious when you’ve written a query that could do with an index. Here we’ll cover.

Read Post

Incident.io

Read more about Choosing the right Postgres indexes

What is DORA and how will it affect me?

Oct 16, 2024 By Charlie Kingston In Incident.io

The Digital Finance Strategy is a European directive that aims to support and develop digital finance in Europe while maintaining financial stability and consumer protection. There are three main components to the package: In this blog post, we’ll attempt to summarize the 113-page DORA proposal, highlighting how it will apply to incident management at financial entities. Side note: we also wrote a blog post about the other DORA, also known as the DevOps Research and Assessments.

Read Post

Incident.io

Read more about What is DORA and how will it affect me?

Mastering regulatory compliance with incident.io

Oct 14, 2024 By Chris Evans In Incident.io

The origin of incident.io goes back to our days building Monzo, a UK-based bank, where Stephen, Pete, and I first crossed paths. As a bank, compliance with numerous regulations was, unsurprisingly, a top priority. When it came to incident management—something we were very involved in—this meant that every aspect of reporting, policy adherence, and root cause analysis (or "contributing factors," as we called it) had to be managed consistently and meticulously.

Read Post

Incident.io

Read more about Mastering regulatory compliance with incident.io

What is a SEV1 incident? Understanding critical impact and how to respond

Oct 11, 2024 By Kate Bernacchi-Sass In Incident.io

In the world of incident management, a SEV1 incident is something of lore: you’ve either heard the tales of the critical outages that result in widespread disruption and chaos, or you’ve lived through one (and lived to tell the tale). SEV1 incidents are a game-changer. When one hits—think major outages or critical failures—it can seriously impact a business, leading to lost revenue, unhappy customers, and a whole lot of chaos.

Read Post

Incident.io

Read more about What is a SEV1 incident? Understanding critical impact and how to respond

Why I like discussing actions items in incident reviews

Oct 7, 2024 By Chris Evans In Incident.io

Are incident reviews about learning or tracking actions? This question has sparked recent debate in incident management circles, including in my recent panel at SEV0 and in Lorin Hochstein’s post. Should the goal of an incident review be learning, or should it focus on tracking actionable improvements? When is the right time to discuss actions, and are they picked up just to make us feel better? From my experience, learning from incidents and identifying actions are inseparable.

Read Post

Incident.io

Read more about Why I like discussing actions items in incident reviews

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Organizing ownership: How we assign errors in our monolith

How we handle sensitive data in BigQuery

How we model our data warehouse

Lessons from 4 years of weekly changelogs

Observability as a superpower

Choosing the right Postgres indexes

What is DORA and how will it affect me?

Mastering regulatory compliance with incident.io

What is a SEV1 incident? Understanding critical impact and how to respond

Why I like discussing actions items in incident reviews

Monthly Archive

Follow Us