|
By Chris Evans
Since its launch in 2009, PagerDuty has been the go-to tool for organizations looking for a reliable paging and on-call management system. It’s been the operational backbone for anyone running an ‘always-on’ service, and it’s done the job well. Ask anyone about the product, and you’re all-but-guaranteed to hear the phrase “it’s incredibly reliable.” I agree. But reliability isn’t everything.
|
By Navo Das
It's no secret that building a data-driven culture in a company is hard, but what is it exactly that makes this such a tricky endeavor? Contrary to popular belief, technology isn't the main hurdle. A recent survey reveals that only a quarter of respondents cite technological limitations as the primary obstacle to becoming data-driven.
|
By Stephen Whitworth
I want to walk you through how incident management has evolved, drawing from real data and the experiences of some of the most sophisticated tech organizations out there. I'll also introduce you to a framework we’ve developed at incident.io: the Incident Maturity Model. This framework is the result of thousands of conversations with companies and provides a clear roadmap to help your organization improve its incident management practices—no matter where you're starting from.
|
By Chris Evans
On August 28th, 2023—right in the middle of a UK public holiday—an issue with the UK’s air traffic control systems caused chaos across the country. The culprit? An entirely valid flight plan that hit an edge case in the processing software, partly because it contained a pair of duplicate airport codes.
|
By Lawrence Jones
Picture this: your alerting system needs to tell you it's broken. Sounds like a paradox, right? Yet that’s exactly the situation we face as an incident management company. We believe strongly in using our own products - after all, if we don’t trust ourselves to be there when it matters most, why should the thousands of engineers who rely on us every day? However, this poses an obvious challenge.
|
By Martha Lambert
At incident.io, we run on a monolith. This brings a whole load of benefits that we don’t want to give up any time soon. We don’t have to worry about the speed of internal network requests, complex deployments, or optimizing work that touches multiple services. This blog post isn’t about the relative benefits of monoliths though (but we’ve written more about that here if you are interested)! Ownership in monoliths is tricky.
|
By Lambert Le Manh
As a provider of incident management software, we at incident.io manage sensitive data regarding our customers. This includes Personally Identifiable Information (PII) about their employees, such as emails, first names, and last names, as well as confidential details regarding customer incidents, such as names and summaries. Consequently, we approach the management of this data with a great deal of care.
|
By Jack Colsey
We've written several times about our data stack here incident, but never about our underlying data warehouse and the design principles behind it. This blog post will run through the high-level structure of our data warehouse and then will go in-depth into the underlying layers.
|
By Pete Hamilton
Writing a meaningful update for customers every week has been held sacred at incident.io since we started the company. We've written over 200 of them in the past 4 years, and we recently celebrated going 2 years straight without missing a single a single week The numbers themselves are not the goal, but the consistency of this habit and what it represents for our customers and our team is very real, and special to me.
|
By Sam Starling
With every job I have, I come across a new observability tool that I can’t live without. It’s also something that’s a superpower for us at incident.io: we often detect bugs faster than our customers can report them to us. A couple of jobs ago, that was Prometheus. In my previous job, it was the fact that we retained all of our logs for 30 days, and had them available to search using the Elastic stack (back then, the ELK stack: Elasticsearch, Logstash, and Kibana).
|
By Incident.io
A full walkthrough of incident.io Response, On-call and Status Pages.
|
By Incident.io
In this episode, we take a look back at 2024 at @incident-io — reflecting on the year’s personal milestones, company-wide changes, and how our product has evolved along the way. Of course, no reflection would be complete without a healthy dose of "banter". Join us as we wrap up the year with insights, laughs, and a lookahead to what's coming early 2025.
|
By Incident.io
This week, we show how you can manage large-scale incidents by breaking the work down into streams with their own Slack channels and calls.
|
By Incident.io
This week we walk through writing post-mortems in the app, from resolving the incident to building a comprehensive post-incident summary directly in-app.
|
By Incident.io
Watch Derek's full talk from SEV0 here: https://go.incident.io/a8xPaeB
- January 2025 (2)
- December 2024 (10)
- November 2024 (8)
- October 2024 (6)
- September 2024 (3)
- August 2024 (4)
- July 2024 (12)
- June 2024 (8)
- May 2024 (13)
- April 2024 (18)
- March 2024 (15)
- February 2024 (18)
- January 2024 (9)
- December 2023 (10)
- November 2023 (5)
- October 2023 (10)
- September 2023 (16)
- August 2023 (3)
- July 2023 (8)
- June 2023 (6)
- May 2023 (4)
- April 2023 (8)
- March 2023 (2)
- February 2023 (5)
- January 2023 (5)
- December 2022 (3)
- November 2022 (4)
- October 2022 (10)
- September 2022 (7)
- August 2022 (11)
- July 2022 (6)
- June 2022 (3)
- May 2022 (2)
- April 2022 (3)
- March 2022 (6)
- February 2022 (7)
- January 2022 (2)
- December 2021 (5)
- November 2021 (5)
- October 2021 (2)
Create, manage and resolve incidents directly in Slack. Leave the admin and reporting to us.
Improving your incident response, visibility, and ability to learn:
- Less faffing, more fixing: We take care of the admin during incidents, so you can save your brainpower for the decisions that matter.
- Divide and conquer: We make sure everyone’s role is clear, track who’s working on what, and help you escalate if you need extra help.
- Get up to speed, at speed: Get everyone on the same page from the moment they join the incident, and help stakeholders stay in the loop.
- Timelines, in no time: Constructing an incident timeline for review is important, but time consuming. We’ll build one for you in real-time, and keep it constantly up to date.
- Data and insights you can trust: You’ve already paid for your incidents. By surfacing the data you need to make decisions, we help you get your money’s worth.
Incident response for your whole organisation.