Latest Posts

Engineering nits: Building a Storybook for Slack Block Kit

Nov 28, 2023 By Lawrence Jones In Incident.io

We care a lot about the pace of shipping at incident.io: moving fast is a fundamental part of our company culture, and out-pacing your competition is one of the best ways we know to win. In engineering teams, one way to ship fast is to invest in tools that make your team more productive. We've become good at identifying small pains and frustrations that slow us down over time and – after surfacing them to the rest of the team – find solutions for them.

Read Post

Incident.io

Read more about Engineering nits: Building a Storybook for Slack Block Kit

Your incident declaration form is (probably) too long: The power of concise reporting

Nov 27, 2023 By Matilda Hultgren In Incident.io

It’s 10am, your coffee is ready and piping hot, and you have just been paged. Looks like is down, and customers are starting to notice. With no time to lose, you open up your organization’s incident declaration form and you spend the next thirty minutes filling out the fifteen required fields, while the incident grows bigger and more complex, messages are rolling in, and your coffee grows cold.

Read Post

Incident.io

Read more about Your incident declaration form is (probably) too long: The power of concise reporting

Should data teams consider incident management tools to respond to pipeline issues?

Nov 21, 2023 By Jack Colsey In Incident.io

Data teams are adopting more processes and tools that align with software engineering, and from talks at the dbt Coalesce conference in 2023, there’s clearly a big push towards adopting software engineering practices at enterprise scale companies. At the moment, there are a lot of tools in the data space for identifying errors in data pipelines, but no tools for responding to these errors, such as coordinating fixes. This is exactly where an incident management platform makes sense to implement.

Read Post

Incident.io

Read more about Should data teams consider incident management tools to respond to pipeline issues?

Incident management really can be for everyone

Nov 14, 2023 By incident.io In Incident.io

Incident management tools are often built for engineers to solve technical issues. On the surface, thinking of incident management as an engineering problem makes sense, and it’s an approach that’s widely used by many organizations from small startups to large enterprises. When there's a problem like a checkout page failure or a server crash, it’s natural for engineers to spring into action, declaring and resolving these incidents.

Read Post

Incident.io

Read more about Incident management really can be for everyone

The price of building your own incident management tool is not what it seems.

Oct 23, 2023 By Asiya Gorelik In Incident.io

Build or buy? An age-old decision that gets made dozens of times a year. It’s quite possibly one of the most important decisions you make as an company. It impacts roadmaps, productivity, team structure, and customer satisfaction (you know, just a few little things). There are a lot of factors to consider, one of the most prominent being cost. So, what exactly are the costs you need to consider when building your own incident management solution?

Read Post

Incident.io

Read more about The price of building your own incident management tool is not what it seems.

Learning Flows: Bringing consistency to your post incident processes

Oct 16, 2023 By Luis Gonzalez In Incident.io

To get the most out of your incident response processes, consistency is crucial. The more predictable you can be whenever issues crop up, whether a small bug or a major outage, the quicker and more confidently you can respond. In practice, incident response is equal parts knowing how to actually resolve the issue and having the confidence that the processes in place will help get you through without added stress.

Read Post

Incident.io

Read more about Learning Flows: Bringing consistency to your post incident processes

A guide to post-mortem meetings and how we run them at incident.io

Oct 11, 2023 By Luis Gonzalez In Incident.io

You've just made it through a particularly tough incident. It was a short outage affecting a subset of customers, so not exactly the end of the world, but bad enough that it involved multiple people across a number of teams to resolve. Either way, the incident was well managed, and the dust has settled. Now what? Most guidance would say that putting together a post-mortem document is a good idea, given the severity of the incident. You've also done this, so what's next?

Read Post

Incident.io

Read more about A guide to post-mortem meetings and how we run them at incident.io

Whose fault was it anyway? On blameless post-mortems

Oct 4, 2023 By incident.io In Incident.io

No one wants to be on the receiving end of the blame game—especially in the wake of a major incident. Sure, you know you were the one who made the final change that caused the incident. And hopefully, it was a small one that didn’t cause any SEV-1s. Still, the weight of knowing you caused something bad should be enough, right? Unfortunately, sometimes fingers get pointed, your name gets called, and suddenly, everyone knows that you’re the person who created more work for everyone.

Read Post

Incident.io

Read more about Whose fault was it anyway? On blameless post-mortems

Better learning from incidents: A guide to incident post-mortem documents

Sep 27, 2023 By Luis Gonzalez In Incident.io

If you’re just starting out in the world of incident response, then you’ve probably come across the phrase “post-mortem” at least once or twice. And if you’re a seasoned incident responder, the phrase probably invokes mixed feelings. Just to clarify, here, we’re talking about post-mortem documents, not meetings. It’s a distinction we have to make since lots of teams use the phrase to refer to the meeting they have after an incident.

Read Post

Incident.io

Read more about Better learning from incidents: A guide to incident post-mortem documents

Clouds, caches and connection conundrums

Sep 26, 2023 By Ben Wheatley In Incident.io

We recently moved our infrastructure fully into Google Cloud. Most things went very smoothly, but there was one issue we came across last week that just wouldn’t stop cropping up. What follows is a tale of rabbit holes, red herrings, table flips and (eventually) a very satisfying smoking gun. Grab a cuppa, and strap in. Our journey starts, fittingly, with an incident getting declared... 💥🚨

Read Post

Incident.io

Read more about Clouds, caches and connection conundrums

Operations | Monitoring | ITSM | DevOps | Cloud

Engineering nits: Building a Storybook for Slack Block Kit

Your incident declaration form is (probably) too long: The power of concise reporting

Should data teams consider incident management tools to respond to pipeline issues?

Incident management really can be for everyone

The price of building your own incident management tool is not what it seems.

Learning Flows: Bringing consistency to your post incident processes

A guide to post-mortem meetings and how we run them at incident.io

Whose fault was it anyway? On blameless post-mortems

Better learning from incidents: A guide to incident post-mortem documents

Clouds, caches and connection conundrums

Monthly Archive

Follow Us