Operations | Monitoring | ITSM | DevOps | Cloud

FireHydrant

More than downtime: the opportunity costs of poor incident management

In my last blog post, I wrote about the explicit costs of incidents — the ones you can easily track based on dollars lost. But the cost of incidents goes beyond the time spent resolving them. While we’re spending our time managing incidents (that includes mitigating and retrospectives), we’re incurring a large opportunity cost in terms of releasing the next big thing.

More than downtime: the explicit costs of poor incident management

A cold fact of SaaS Life™ is that you can’t make money when your product or website doesn’t work — and those lost dollars add up fast. Downtime, SLA breach paybacks, compliance fines, and other explicit costs are the easiest to quantify and they’re what most people think of when they think about incidents.

Exploring distributed vs centralized incident command models

Recently in our Better Incidents Slack channel, there’s been some chatter around how people structure dedicated incident commanders at their company: distributed or centralized. The way I see it, there are two types of commanders: the temporary, distributed role — a hat that an on-call engineer or an engineering manager puts on during an incident. Then there’s the centralized, full-time role, where someone is the designated incident commander (or one of a few) for all incidents.

Custom fields: make FireHydrant your personalized incident management platform

Today we're releasing custom fields, a powerful new feature that empowers you to tailor FireHydrant to your organization's specific needs and capture essential incident details. Custom fields help you track critical states, involved parties, resolution specifics, affected services, messages, and more — almost anything you want! — all aligned with your unique workflows. Regardless of the size of your team or the maturity of your processes, custom fields adapt to your workflow.

210% ROI: unlocking the economic value of FireHydrant for incident management

In the fast-paced high-tech industry, efficient incident management is a critical factor in maintaining brand reputation, employee morale, and most importantly, your bottom line. Good practices can result in reduced downtime, increased learning opportunities from incidents, and an enhanced reputation among both the engineering community and customers. But quantifying the true cost of incidents has always been a challenge — until now.

Align platform and product engineering teams over incidents

I firmly believe in never letting a good incident go to waste. Incidents expose weak spots and create opportunities for medium and long-term investments. In analyzing incidents and understanding their root causes, organizations can identify areas that require additional resources or enhancements. When incidents are used to align your platform and product engineering, it opens up opportunities to enhance the performance and security of your product.

Incident severity: why you need it and how to ensure it's set

Defined severity levels quickly get responders and stakeholders on the same page on the impact of the incident, and they set expectations for the level of response effort — both of which help you fix the problem faster. But sometimes, for whatever reason, a severity level just doesn’t get set. Maybe there’s confusion around what severity level to use. Or maybe you have a low barrier to declaration and your responders just need a little nudge.

Upgraded role-based access control brings more visibility - and control - to incident management at your organization

We’ve long believed that incidents (and technical team cultures) improve when more people are empowered to declare, follow, and contribute to their resolution. But not everyone in an organization needs to be able to manage the processes, rules, and settings a company enforces for their incident programs.

FireHydrant Private Incidents & Runbooks: more control for you, more security for your customers

Ensuring the privacy and security of sensitive information is crucial no matter your company's size or industry. So when an incident comes up that includes sensitive information — Personal Identifiable Information (PII), financial data, accidental data breaches, or legal matters requiring privileged communication — your response process might need a higher level of security and discretion.

The "people problem" of incident management

Managing incidents is already tricky enough, and you want to get to mitigation as quickly as possible. But sometimes it feels like organizing everything surrounding an incident is more difficult than solving the actual technical problem and you end up getting delayed or sidetracked during mitigation efforts. We call that scenario the “people problem” of incident management.