Avoid the Chaos Engineering bottleneck
Chaos Engineering is great, but by itself it can create bottlenecks that limit your reliability journey. Find out how to be reliable at scale with Gremlin → https://www.gremlin.com/
FULL TRANSCRIPT:
One of the things we've learned while building Gremlin and being the first Chaos Engineering tool to market is with all the greatness that comes with this approach, we've learned some of the downfalls, some of the drawbacks. And one of those is how you scale this practice.
One of the things we've observed a lot of early customers and adopters in this space do is really make this the SRE team's problem or to make it a single team's problem. And they need to go team by team and really ask everyone to go do the work that needs to be done, correlate the responses, help them debug it, and then present it to leadership.
Well, that's a bit of a bottleneck.
And so what we learned along the way is we really needed to provide each team with the tools to go run the tests and answer the questions on their own. And a way to collate that up and give leadership that visibility into what's happening so they know where to spend their time, where to hold people accountable, and where to reward the good work.
So within Gremlin's Reliability Management, we built a test suite, which allows us to have an out of the box, set up place for engineers to begin.
We integrate with your alerting and your monitoring so we're able to tell you exactly why your system failed and whether or not you pass that test. And then we're gonna pull this all together into a nice report that you can show your boss that shows how you've made improvement over the last quarter, how you've mitigated many reliability risks, and how you're ultimately more reliable now as a result.