Chaos Engineering


Is your Grafana dashboard ready to spot chaos?

When it comes to systems reliability, you wouldn’t normally think that unleashing additional chaos would actually be helpful, would you? As more engineering teams moved toward microservice-based architectures for cloud applications over the course of this past decade, many of them didn’t change their testing strategies.

Chaos Engineering for DynamoDB

Amazon DynamoDB is fast, powerful, and intended for high availability. These are all valuable attributes in a data storage solution, but to be useful as advertised, it must be configured thoughtfully. Learn how to use Chaos Engineering to ensure DynamoDB performs the way you expect. In this guide, we cover: Amazon DynamoDB is one of the most popular NoSQL databases and is the data store of choice for many teams running production workloads in AWS.

How to Implement Chaos Engineering at Your Company

By following this guide, you'll successfully increase your organization's reliability with minimal effort and risk. This document will serve as your guide to implementing Chaos Engineering and Gremlin within your organization. From educating your team on the principles of Chaos Engineering to running automated experiments, this guide will walk through each stage of the adoption process in order to ensure a smooth and successful rollout.

Chaos Engineering: Finding Failures Before They Become Outages

Learn the basics of Chaos Engineering: discover the tools, tests, and culture needed to create better software and prevent outages and downtime. This whitepaper provides a comprehensive introduction to the discipline of Chaos Engineering including why it is more needed than ever, how to get started, and best practices to maximize learnings and reduce risk.

Announcing Chaos Conf 2020 (Online): Be Prepared For Moments That Matter

We’re excited to announce the third annual Chaos Conf! Given the events with Covid-19 this year, we will be holding this event fully online for the health and safety of attendees. The unforeseen impact of this virus on our lives, our businesses, and our software highlights the importance of preparing for the unexpected. Our theme for this year highlights this: Prepare for moments that matter. Chaos Conf will take place over the course of three days: October 6–8.


Ensuring reliability when modernizing financial applications

For decades, information technology in the financial services industry meant deploying bulky applications onto monolithic systems like mainframes. These systems have a proven track record of reliability, but don’t offer the flexibility and scalability of more modern architectures such as microservices and cloud computing. During periods of unexpectedly high demand, this inflexibility can cause technical issues for organizations ranging from personal trading platforms to major banks.


Gremlin User Newsletter: AWS App2Container, an update to the WAF, and what's new in Gremlin

As systems become increasingly complex, we’ve seen the growth of engineering tools to abstract away and manage the complexity. But often our tools are “opinionated” and the default actions or settings may not align with how our systems are intended to work or how we think they work. Chaos Engineering is a good way to not only test your applications, but also the tools you use to build them.


Testing the reliability of your fulfillment center

Fulfillment pipelines for order management in e-commerce have a lot of intricate moving parts that depend on one another. Sales orders, fulfillment, negotiation, shipment, and receipt are closely interconnected but require different actions while depending on one another closely. You also need messaging around order statuses, conditions, actions, rules, and inventory, just to name a few of the important parts of these complex systems.


What's the reliability of your checkout process?

One of the reasons companies practice Chaos Engineering is to prevent expensive outages in retail (or anywhere, for that matter) from happening in the first place. This blog post walks through a common retail outage where the checkout process fails, then covers how to use Chaos Engineering to prevent the outage from ever happening in the first place. Let’s dive in. Maybe you’ve been there.


Building more reliable financial systems with Chaos Engineering

The financial services industry has built in more capital buffers to prevent market shocks from bringing another economic collapse. In addition to these financial controls, many banks and personal trading platforms have begun building resiliency into information technology shocks. Despite these new precautions, we’re still seeing outages today, preventing customers from depositing and withdrawing their money, completing transactions, and executing trades during key events.