Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Effective incident escalations

In the ever-evolving digital landscape, every organization must confront its fair share of incidents. Regardless of the sector or size, one common thread weaves through them all: the need for effective incident management. A crucial part of this management is incident escalation, a topic on which we've had many discussions with various companies.

Synchronizing mental models

In the heat of an incident, having a clear and shared understanding of what’s going on is absolutely crucial to effective response. But often what actually happens is that people involved in incidents build their own picture and narrative of the event, shaped by their own expertise, their past experiences, and what they’re seeing and hearing as the incident develops. The pictures and perspective people build is often referred to as a mental model.

Announcing Catalog - the connected map of everything in your organization

One of the most painful parts of incident response is contextualizing the problem and understanding how and where it fits within your organization. If responders are unable to answer basic questions such as: Then you waste valuable time talking to the wrong people or solving the wrong problems — ultimately extending impact and hurting your response. It’s a common issue that, up until now, didn’t have a clear solution or workaround.

How our product team use Catalog

We recently introduced Catalog: the connected map of everything in your organization. In the process of building Catalog as a feature, we’ve also been building out the content of our own catalog. We'd flipped on the feature flag to give ourselves early access, and as we went along, we used this to test out the various features that Catalog powers.

Services are not special: Why Catalog is not just another service catalog

As you may have already seen, we’ve recently released a Catalog feature at incident.io. While designing and building it, we took an approach that’s a tangible departure from a traditional service catalog. Here’s how we’re different, and why.

Using DORA metrics Mean Lead Time for Changes to deliver iterations faster

Here's what you can expect to learn from this article: Raise your hand if you like shipping changes quickly. (Yes, let's assume that everything you're shipping has value and isn't a vanity project). Chances are, you, the person reading this now, agreed with the above. When you start on a project, big or small, you want to keep any changes moving along and avoid getting stuck. The less time between the beginning and end of a project, the faster you can shift your focus to other things.

Learning from incidents is not the goal

Learning from incidents has become something of a hot topic within the software industry, and for good reason. Analyzing mistakes and mishaps can help organizations avoid similar issues in the future, leading to improved operations and increased safety. But too often we treat learning from incidents as the end goal, rather than a means to achieving greater business success. The goal is not for our organisations to learn from incidents: it’s for them to be better, more successful businesses.

Trust shouldn't start at zero

How often have you heard the phrase “trust is earned” in life? While well-meaning, I think this can actually lead to some strange behaviour at work, especially when you’re on a fast growing team. Startups experience a lot of chaos and unknowns your teams need to navigate, so it’s vital to know you can trust the people around you. As you grow, how you set expectations around trust as people join your team can impact your ability to hire, onboard, ship and ultimately, survive.

Reflecting on one of the biggest incidents in our history

We have to come clean. During KubeCon, we experienced an incident that we weren’t ready to discuss until now. This incident caused quite a disruption and, had it been left unresolved, would have had a massive snowball effect. At the time, we didn’t want to raise any alarms, so we kept it quiet while our team rallied to resolve it. And to be honest, most folks probably didn’t even realize that it happened since we moved so quickly.