Operations | Monitoring | ITSM | DevOps | Cloud

Incident.io

Learning from incidents is not the goal

Learning from incidents has become something of a hot topic within the software industry, and for good reason. Analyzing mistakes and mishaps can help organizations avoid similar issues in the future, leading to improved operations and increased safety. But too often we treat learning from incidents as the end goal, rather than a means to achieving greater business success. The goal is not for our organisations to learn from incidents: it’s for them to be better, more successful businesses.

Trust shouldn't start at zero

How often have you heard the phrase “trust is earned” in life? While well-meaning, I think this can actually lead to some strange behaviour at work, especially when you’re on a fast growing team. Startups experience a lot of chaos and unknowns your teams need to navigate, so it’s vital to know you can trust the people around you. As you grow, how you set expectations around trust as people join your team can impact your ability to hire, onboard, ship and ultimately, survive.

Reflecting on one of the biggest incidents in our history

We have to come clean. During KubeCon, we experienced an incident that we weren’t ready to discuss until now. This incident caused quite a disruption and, had it been left unresolved, would have had a massive snowball effect. At the time, we didn’t want to raise any alarms, so we kept it quiet while our team rallied to resolve it. And to be honest, most folks probably didn’t even realize that it happened since we moved so quickly.

It's time to rethink the way you do external comms

April was a month to remember at incident.io. Not only did we attend our second conference ever with KubeCon in Amsterdam, but we also very subtly released our brand-new Status Pages product. OK, it probably wasn't subtle. Both moments required months of preparation, feedback loops, iteration, and so much more behind-the-scenes work to get right. So if you ran into us at KubeCon, thank you for stopping by and meeting with our team.

9 incident management solutions to improve your workflows

Incident management is a team effort. While it's true that incident management should be seen as a company-wide effort, and you should empower all teams to declare incidents, this differs from the team effort I'm referring to here. No, incident management is a team effort in the sense that no one tool can do it all, not even incident.io. We covered as much when we discussed why we integrate with tools that can be seen as our competitors – and that’s OK!

Battling database performance

Earlier this year, we experienced intermittent timeouts in our application while interacting with our database over a period of two weeks. Despite our best efforts, we couldn’t immediately identify a clear cause; there were no code changes that significantly altered our database usage, no sudden changes in traffic, and nothing alarming in our logs, traces, or dashboards. During that two-week period, we deployed 24 different performance and observability-focused changes to address the problem.

How we built it: incident.io Status Pages

We kicked off 2023 with a new team and a new product to build - Status Pages. We wanted to build a solution we could ship to customers as quickly as possible, while making sure that it’s reliable, fast and beautiful. Here’s how that process played out over the course of three months.

Announcing incident.io Status Pages - powering clear external comms to build trust

Clear and frequent communication carries considerable weight in today's era of hyper-competition among businesses—especially during incidents. Because of this, status pages have become the go-to choice for companies looking to prioritize trust, transparency, and clarity with their customers, even during downtime. Unfortunately, current status page solutions have made these communications particularly frustrating and stressful.

Our A, B, Cs of external communications

Communication carries more weight than ever before. Businesses are so much more connected to their customers given the number of mediums they can communicate through; Twitter, Instagram, Facebook, and even TikTok. Because of this, it's essential to prioritize these lines of communication throughout your day-to-day. Some might even say that over-communicating is the best way forward. Why? No one likes a company that appears simply like a black box with zero insight into what's happening.