Reliability testing creates important conversations

Dec 9, 2025

Without reliability tests, you’re left in the dark about how your system will react to failures. It leaves you open to outages and prevents vital conversations that will help improve your reliability. Find out how to be reliable with Gremlin → https://www.gremlin.com/

Full transcript:
Every week we review the reliability scores of all of our services together. So we can see not only how do service scores stack up against each other, but how do they stack up from the previous week?

And the scores are a great way to get an aggregate look at just what has changed between all of those pieces. And so it's a great conversation starter to go and investigate, "Hey, why did the score go down from last week? Was it a particular set of tests that had failed? Have we introduced new dependencies that have not yet been tested?"

It's the conversations around figuring out where that score has changed that lead to the natural questions around what should my system do in these situations. It's very much a team activity because that's where all of those great questions come from.

It's starting with something very simple, seeing a score change and then digging into the breadcrumbs around failing tests and other aspects of the service that changed so that you can really get to the conversations that need to be had around: what do I need to think about when it comes to a particular failure scenario?