How to test your systems for scalability and redundancy with fault injection

Jun 13, 2025

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Join us for the next one→https://www.gremlin.com/officehours

Do you know if your services can tolerate losing a node? What about an entire availability zone? Or a region?

Large-scale outages aren’t unheard of. When you’re running critical services, it’s vital that those services can keep running even if an AZ or region fails. In addition to failing over, these services also need to scale quickly so traffic shifts don’t overwhelm your systems. How do you prove that a service is both scalable and redundant? The answer is with Fault Injection.

In this webinar, we’ll show you how to test the scalability and redundancy of your systems by testing them directly. We’ll use Fault Injection to simulate large-scale failures, use observability tools to monitor the state of our systems, and discuss ways of using our findings to make our systems more resilient.

About Gremlin Office Hours:
See Gremlin in action as one of our experts guides you through the platform in our monthly interactive session. You’ll have an opportunity to have your questions answered during the interactive Q&A segment.

Check out previous Office Hours on-demand or sign up to join the next one at https://www.gremlin.com/officehours