The Three Arcs of Gremlin
Gremlin CEO Kolton Andrus talks about Gremlin’s journey in helping you find and fix reliability risks. Find out how to be reliable with Gremlin → https://www.gremlin.com/
FULL TRANSCRIPT:
One of the things we've always tried to do at Gremlin is make it easy to do the right thing.
The first version of the product, we inject the failure, you, the engineer, you go do the homework to see if your system responded correctly and then you, the engineer, you go fix the problem. Gremlin's not doing as much as we could in that scenario.
In the second version, Gremlin 2.0, we'll inject the failure, but because we have access to your monitoring and your system, we'll tell you if the system responded well or not. But if there's a problem, you, the engineer, you still need to go figure it out and fix it.
So the third arc is very simple: we want to go help them fix it now. So we want to inject the failure safely and securely. We want to tell you, did your system respond correctly? And if not, how did it respond incorrectly? But then we want to give guidance. We want to tell people concretely, "Hey, here's how to go fix this problem."
I think this is a fun conversation because this could go anywhere from, Hey, we saw this Java exception in your log lines. We can stack overflow, etc. You know, give you some guidance on how to go fix that exception, but we can also derive patterns and we have a lot of customer data around testing various failures.
So we can be pretty confident: oh, you need a circuit breaker around this dependency, because this dependency fails often. Or you need a better timeout when you're talking to this dependency because your requests are timing out because latency grows too hot. In a nutshell, that's what we wanna do. We wanna give people actionable, credible advice on how to go fix their system, not to just uncover the weaknesses within it.