FireHydrant

firehydrant

A single person on-call "rotation" is a critical vulnerability

One of the most common complaints we hear from operations and site reliability engineers is about the quality of life impacts and the resulting stress imposed by their on-call responsibilities. Most of us are already aware that a proper on-call rotation is critical to our engineering organization’s health in terms of both immediate incident response and long-term sustainable growth.

firehydrant

Open Source can be a silver bullet, but your application might be a werewolf

I was reminiscing about an incident that happened at a past job with an old co-worker. You know the one, the one where you installed a library that makes some task of yours simple, only to reveal the library makes things worse. This incident in particular involved the way that images served out of our Ruby on Rails application, and the library that made it possible to “easily resize before serving” them.

firehydrant

Announcing our AWS CloudTrail Integration

One of the most common reasons for system failures is changes to the underlying infrastructure. Amazon CloudTrail does a great job of recording when actions are taken but a lot of organizations don’t take advantage of it. FireHydrant now includes this data, giving you visibility into changes to your infrastructure while you’re investigating an incident.

firehydrant

Dynamic Kubernetes Informers

In the past I’ve written about how to use informers in Kubernetes for particular resources, but what if you need to be able to receive events for any Kubernetes resource dynamically? Well, there’s a client-go package for that too. At FireHydrant, we recently updated our Kubernetes integration to watch changes for any resource you configure and I wanted to write down how we made it at a high level.

firehydrant

Announcing our Statuspage.io integration

Ever go to a status page and it says everything is operational when it definitely isn’t? You refresh maddeningly thinking it might be you. You ponder if the bill for the internet has been paid. Then, as a last resort, you check Twitter only to discover hundreds of people are experiencing the same problem. This is common, and because of it, we’re happy to release out integration with Statuspage.io!

firehydrant

3 Defensive Programming Techniques for Rails

Incidents happen all the time because of bad code deploys. You write some code that passes code review, it then is automatically shipped to production after a test suite passes, and BAM, an outage happens. This fairly common occurrence has ways to prevent it entirely. Using some simple ideas we can defend ourselves from the hidden mistakes that code reviews and chaos engineering sometimes won’t catch.

firehydrant

Announcing Flare: Make opening incidents stress free

We’re launching a new feature today that allows anyone in your organization to kick off your incident response process with an appropriate severity level attached from Slack. Often people are afraid to open an incident or even share that they’re aware of something going wrong with your applications. When everything is important, nothing is important; users frequently overestimate the impact of an incident and assign an inappropriately high severity level.

firehydrant

So You Want To Give A Tech Talk?

So you’ve signed up to give a tech talk, awesome! You’re a subject matter expert in something and want to share you knowledge, that’s what helps make a community awesome. You’re going to be speaking in front of a room of people that you don’t know in a place you’ve likely never been, talking about something you confidently know. Sounds easy, right?