Operations | Monitoring | ITSM | DevOps | Cloud

FireHydrant

Assembly time is where you have the most control of an incident

The FDNY EMS Command responds to more than 4,000 calls per day. They range from car accidents to building fires to cats stuck in trees, and responses vary accordingly. Sometimes they might take hours, sometimes they take just a few minutes. With such unpredictable conditions, the FDNY focuses on improving what they call “response time.” That’s the amount of time between a 911 call being made and emergency responders arriving on the scene. This might sound familiar.

How to get started with incident management metrics

Tracking incident metrics can help you discover patterns in the causes and costs of incidents and help you understand brittle parts of your organization. We've seen them help teams zero in on things like: But it can be intimidating to get started. Do you really need metrics if you're a small team or just beginning to formalize your incident management program? I say yes. The key is to start with something manageable and grow.

Forgot to declare an incident? Add it retroactively in FireHydrant.

Have you ever quickly worked through an issue with your team and later thought, “Huh. That probably should have been an incident.” It happened to us just a few weeks back. After one of our engineers surfaced a failed build, a few folks chimed in to problem solve and within 30 minutes things were up and running like normal. But we probably should have declared an incident.

The why and how behind running incident response game days

In any high pressure situation, the key to fast action is preparedness. And that’s true when it comes to incidents, too. Documenting and training your team on your incident response processes is essential to ensuring a coordinated and efficient response effort. And training sessions, or game days, as they’re sometimes called, are one way to get everyone up to speed.

Create a service catalog that grows with you

When your incident response process is centered around a service catalog, responders are able to more quickly pinpoint the service or functionality that’s down, bring in the team or experts, and then get to solving the problem faster. Saving even a few minutes can have a big impact on decreasing the costs around incidents and outages, so having up-to-date service details at your fingertips can make all the difference.

How FireHydrant handled the SVB banking crisis

On Thursday, March 9, 2023, something was afoot at our primary bank, SVB. By Friday, March 10, 2023, messages from our investors helped us quickly understand that FireHydrant needed to maneuver through a complex incident that was unfolding. Operational incidents are incidents like every other.

Automatically Create Incidents from Alerts with Alert Routing

Shouldn’t your alerts be doing more of the work for you? A noisy channel with every alert from hundreds of monitors and microservices is a chaotic place to actually find the incidents that are impacting your customers. And it still requires a heck of a lot of human intervention. We think it’s time for something better. Today we’re releasing Alert Routing: the next phase of worry-free automation from FireHydrant.

How to define roles for your incident response team

Agility matters in incident response, and the easiest way to spring into action is by having a well-defined team in place ahead of time. The right people in the right roles will help you respond to and resolve incidents more quickly and efficiently. In fact, we found in the Incident Benchmark Report that incidents with roles assigned had a 42% lower mean time to resolution than those that didn’t. But what roles do you need to fill?