Operations | Monitoring | ITSM | DevOps | Cloud

July 2020

Chaos Engineering: Finding Failures Before They Become Outages

Learn the basics of Chaos Engineering: discover the tools, tests, and culture needed to create better software and prevent outages and downtime. This whitepaper provides a comprehensive introduction to the discipline of Chaos Engineering including why it is more needed than ever, how to get started, and best practices to maximize learnings and reduce risk.

How to Implement Chaos Engineering at Your Company

By following this guide, you'll successfully increase your organization's reliability with minimal effort and risk. This document will serve as your guide to implementing Chaos Engineering and Gremlin within your organization. From educating your team on the principles of Chaos Engineering to running automated experiments, this guide will walk through each stage of the adoption process in order to ensure a smooth and successful rollout.

Chaos Engineering for DynamoDB

Amazon DynamoDB is fast, powerful, and intended for high availability. These are all valuable attributes in a data storage solution, but to be useful as advertised, it must be configured thoughtfully. Learn how to use Chaos Engineering to ensure DynamoDB performs the way you expect. In this guide, we cover: Amazon DynamoDB is one of the most popular NoSQL databases and is the data store of choice for many teams running production workloads in AWS.

Announcing Chaos Conf 2020 (Online): Be Prepared For Moments That Matter

We’re excited to announce the third annual Chaos Conf! Given the events with Covid-19 this year, we will be holding this event fully online for the health and safety of attendees. The unforeseen impact of this virus on our lives, our businesses, and our software highlights the importance of preparing for the unexpected. Our theme for this year highlights this: Prepare for moments that matter. Chaos Conf will take place over the course of three days: October 6–8.

Gremlin User Newsletter: AWS App2Container, an update to the WAF, and what's new in Gremlin

As systems become increasingly complex, we’ve seen the growth of engineering tools to abstract away and manage the complexity. But often our tools are “opinionated” and the default actions or settings may not align with how our systems are intended to work or how we think they work. Chaos Engineering is a good way to not only test your applications, but also the tools you use to build them.

Ensuring reliability when modernizing financial applications

For decades, information technology in the financial services industry meant deploying bulky applications onto monolithic systems like mainframes. These systems have a proven track record of reliability, but don’t offer the flexibility and scalability of more modern architectures such as microservices and cloud computing. During periods of unexpectedly high demand, this inflexibility can cause technical issues for organizations ranging from personal trading platforms to major banks.

Testing the reliability of your fulfillment center

Fulfillment pipelines for order management in e-commerce have a lot of intricate moving parts that depend on one another. Sales orders, fulfillment, negotiation, shipment, and receipt are closely interconnected but require different actions while depending on one another closely. You also need messaging around order statuses, conditions, actions, rules, and inventory, just to name a few of the important parts of these complex systems.

What's the reliability of your checkout process?

One of the reasons companies practice Chaos Engineering is to prevent expensive outages in retail (or anywhere, for that matter) from happening in the first place. This blog post walks through a common retail outage where the checkout process fails, then covers how to use Chaos Engineering to prevent the outage from ever happening in the first place. Let’s dive in. Maybe you’ve been there.

Building more reliable financial systems with Chaos Engineering

The financial services industry has built in more capital buffers to prevent market shocks from bringing another economic collapse. In addition to these financial controls, many banks and personal trading platforms have begun building resiliency into information technology shocks. Despite these new precautions, we’re still seeing outages today, preventing customers from depositing and withdrawing their money, completing transactions, and executing trades during key events.

How to Convince Your Organization to Adopt Chaos Engineering

Win over and convince your coworkers and management to explore and adopt Chaos Engineering and Site Reliability Engineering (SRE). The playbook provides ideas and techniques that can be used to articulate the need and benefits to internal stakeholders in your organization. It also guides the initial implementation in a way that will lead to success and growth across the organization. Implementing something new like Chaos Engineering successfully is a good way to get promoted and help the organization succeed, and this guide is here to help you.

Chaos Engineering for MongoDB

MongoDB is designed for performance, scale, and high-availability. But, as with any software, you need to test your configuration to verify that it will work as advertised. Ensure that MongoDB performs the way you expect by using Chaos Engineering to test four key features. This guide includes four experiment tutorials to verify that MongoDB will perform reliably: In order to ensure you get the most out of MongoDB's rich features, including built-in data sharding and replication, it's crucial to test your configuration.