Operations | Monitoring | ITSM | DevOps | Cloud

FireHydrant

New MTTX analytics to drive your reliability roadmap

Analytics are great. We can all agree there. But not all analytics are created equal. FireHydrant has long offered incident analytics dashboards that provide an in-depth look at the entire incident lifecycle. You can see how incidents impact services and teams, understand retrospective participation and completion, and even get insight into follow-ups. But great analytics do more than simply organize data. They help you tell a story.

The revolution in critical incident response at Dock: efficient integration and service improvement

In this article, we will explore how Dock is working to significantly enhance its response time to critical incidents, emphasizing effective integration between tools as key to success. We will address how we challenge the conventional approach by shifting the focus from Mean Time to Acknowledge (MTTA) to Mean Time to Combat (MTTC), a customized metric that measures the time between incident detection and effective communication involving professionals capable of resolving it.

The alert fatigue dilemma: A call for change in how we manage on-call

Once the unsung heroes of the digital realm, engineers are now caught in a cycle of perpetual interruptions thanks to alerting systems that haven't kept pace with evolving needs. A constant stream of notifications has turned on-call duty into a source of frustration, stress, and poor work-life balance. In 2021, 83% percent of software engineers surveyed reported feelings of burnout from high workloads, inefficient processes, and unclear goals and targets.

Better Incidents Winter Bonfire: Inside On-Call

Engineers are bombarded with pages left and right. There's uncertainty about how to escalate. A constant blur exists between what's urgent and what can wait. This never-ending ping-pong game takes a toll. Burnout creeps in, and your engineering culture has taken a nose dive before you know it.

Now in beta: alerting for modern DevOps teams

Although FireHydrant has spent five years focused on what happens after your team (erg, I mean service 🙄) gets paged, the topic of alerting often comes up in discussions with our community. People are tired of paying big bucks for software that’s expensive, bloated, and hasn’t seen much innovation. Clearly, there’s a problem here – and we’re tackling it head on.

Captain's Log: Diving into our scheduling design

On-call scheduling is tricky. Like, really tricky. It was one of the scariest parts when we decided to build a modern alerting system earlier this year. We knew we couldn't cut any corners on Day One of our release because it needed to be a fully loaded feature for someone to realistically use our product (and replace an incumbent). This meant including windowed restrictions, coverage requests, and simple to complex rotations.

Your guide to better incident status pages

Your status page (or lack thereof) has the opportunity to signal a lot about your brand — how transparent you are, how quickly you respond to incidents, how you communicate with your customers — and ultimately, this all seriously impacts your reliability. After all, as our CEO Robert put it in a recent interview on the SRE Path podcast, you don’t get to decide your reliability; your customers do.

Captains Log: How we are leveraging CEL for Signals

As engineers, we didn't want to make Signals only a replacement for what the existing incumbents do today. We've had our own gripes for years about the information architecture many old companies still force you to implement today. You should be able to send us any signal from any data source and create an alert based on some conditions. We're no strangers to building features that include conditional logic, but we upped the ante when it came to Signals.

Captains Log: A first look at our architecture for Signals

Welcome to the first Signals Captain’s Log! My name is Robert, and I’m a recovering on-call engineer and the CEO of FireHydrant. When we started our journey of building Signals, a viable replacement for PagerDuty, OpsGenie, etc, we decided very early that we would tell everyone what makes Signals unique, and what better way than to tell you how we’re building it (without revealing too much 😉). Let’s jump in.