Encourage the boring reliability work

Aug 20, 2025

Proactive, regular reliability work is boring, repetitive, and EFFECTIVE. And if leadership wants the incredible results it brings, they have to encourage the right behavior. Find out how to be reliable at scale with Gremlin → https://www.gremlin.com/

FULL TRANSCRIPT:

 The SRE operation space in general, we have a bit of a firefighter hero culture. And look, that was me, I was a call leader at Amazon. I fixed the Amazon retail website from the side of I-5 next to my motorcycle in the rain. You gotta put in the time, you gotta do that work. You gotta reward the behavior you want. So while you might need that behavior in the short term, the behavior you want in the long term is more boring.

Just like unit testing, integration testing, reliability testing. And so if people are spending an hour a week, an hour a month, they're doing these tests, they're making sure they're passing and they're automating them, they're just not gonna have issues.

Now, what's the incentive for them to do that beyond being a good engineer and wanting to do the right thing? And this is the discussion I have with leadership. At your company, if I go do this great work and you have no outages, will anyone know? And will I get promoted? And if the answer is no, don't ask me why you're engineers aren't doing this because the answer's clear.