Operations | Monitoring | ITSM | DevOps | Cloud

Chaos Engineering: How to create an automated Chaos Gauntlet with Gremlin and Jenkins on AWS

In this video, we will demonstrate how to use Gremlin and Jenkins to create an automated Chaos Gauntlet. This will be done using Jenkins Pipelines and Stages to inject a controlled amount of failure with the Gremlin API. We then add a final stage that allows you to optionally halt the attack from the pipeline, rather than having to wait for the full duration of the attack.

Chaos Engineering: The Path to Reliability - Kolton Andrus

We’re all here for the same purpose: to ensure the systems we build operate reliably. This is a difficult task, one that must balance people, process and technology during difficult conditions. We operate with incomplete information, assessing risks and dealing with emerging issues. We’ve found Chaos Engineering to be a valuable tool in addressing these concerns. Learn from real world examples what works, what doesn’t, and what the future holds.

Identifying Hidden Dependencies - Liz Fong Jones

You don't need to write automation or deploy on Kubernetes to gain benefits from resilience engineering! Learn how Honeycomb improved the reliability of our Zookeeper, Kafka, and stateful storage systems through terminating nodes on purpose. We'll discuss the initial manual experiments we ran, the bugs in our automatic replacement tools we uncovered, and what steps we needed to progress towards continuously running the experiments. Today, no node at Honeycomb lives longer than 12 months, and we automatically recycle nodes every week.