Operations | Monitoring | ITSM | DevOps | Cloud

Chaos Engineering

What your company can learn from the Bank of England's resilience proposal

Learn how to modernize your financial systems with confidence while mitigating risk (and meeting compliance). This article was originally published on TechCrunch. The outages at RBS, TSB, and Visa left millions of people unable to deposit their paychecks, pay their bills, acquire new loans, and more.

How many 9's are enough? Kolton Andrus  CTO Connection: Reducing engineering cycle time

How many nines of availability are enough? In this talk, Gremlin CEO Kolton Andrus shares insights from years at Amazon, Netflix, and now working with a wide array of customers across various disciplines and industries. He’ll describe what each level of availability looks like, the challenges faced at each stage, and the trade-offs required to achieve the next nine of uptime.

Is your Grafana dashboard ready to spot chaos?

When it comes to systems reliability, you wouldn’t normally think that unleashing additional chaos would actually be helpful, would you? As more engineering teams moved toward microservice-based architectures for cloud applications over the course of this past decade, many of them didn’t change their testing strategies.

Chaos Engineering: Finding Failures Before They Become Outages

Learn the basics of Chaos Engineering: discover the tools, tests, and culture needed to create better software and prevent outages and downtime. This whitepaper provides a comprehensive introduction to the discipline of Chaos Engineering including why it is more needed than ever, how to get started, and best practices to maximize learnings and reduce risk.

How to Implement Chaos Engineering at Your Company

By following this guide, you'll successfully increase your organization's reliability with minimal effort and risk. This document will serve as your guide to implementing Chaos Engineering and Gremlin within your organization. From educating your team on the principles of Chaos Engineering to running automated experiments, this guide will walk through each stage of the adoption process in order to ensure a smooth and successful rollout.

Chaos Engineering for DynamoDB

Amazon DynamoDB is fast, powerful, and intended for high availability. These are all valuable attributes in a data storage solution, but to be useful as advertised, it must be configured thoughtfully. Learn how to use Chaos Engineering to ensure DynamoDB performs the way you expect. In this guide, we cover: Amazon DynamoDB is one of the most popular NoSQL databases and is the data store of choice for many teams running production workloads in AWS.

Announcing Chaos Conf 2020 (Online): Be Prepared For Moments That Matter

We’re excited to announce the third annual Chaos Conf! Given the events with Covid-19 this year, we will be holding this event fully online for the health and safety of attendees. The unforeseen impact of this virus on our lives, our businesses, and our software highlights the importance of preparing for the unexpected. Our theme for this year highlights this: Prepare for moments that matter. Chaos Conf will take place over the course of three days: October 6–8.

Gremlin User Newsletter: AWS App2Container, an update to the WAF, and what's new in Gremlin

As systems become increasingly complex, we’ve seen the growth of engineering tools to abstract away and manage the complexity. But often our tools are “opinionated” and the default actions or settings may not align with how our systems are intended to work or how we think they work. Chaos Engineering is a good way to not only test your applications, but also the tools you use to build them.

Ensuring reliability when modernizing financial applications

For decades, information technology in the financial services industry meant deploying bulky applications onto monolithic systems like mainframes. These systems have a proven track record of reliability, but don’t offer the flexibility and scalability of more modern architectures such as microservices and cloud computing. During periods of unexpectedly high demand, this inflexibility can cause technical issues for organizations ranging from personal trading platforms to major banks.

Testing the reliability of your fulfillment center

Fulfillment pipelines for order management in e-commerce have a lot of intricate moving parts that depend on one another. Sales orders, fulfillment, negotiation, shipment, and receipt are closely interconnected but require different actions while depending on one another closely. You also need messaging around order statuses, conditions, actions, rules, and inventory, just to name a few of the important parts of these complex systems.