Operations | Monitoring | ITSM | DevOps | Cloud

Chaos Engineering

SRE's Guide to Chaos & Observability

Today’s distributed, cloud-based environments are incredibly complex. Not only does each component depend on many others, but modern systems are also highly dynamic—changing frequently as teams push new code or make updates to infrastructure. Taming this complexity to ensure reliability requires end-to-end observability to understand how components depend on each other. Additionally, proactive Chaos Engineering combined with AI-driven observability lets you uncover “unknown unknowns” that impact how your system will respond to different failure scenarios.

Building Reliable Applications Webinar 6 17 21

Test-driven development (TDD) is a process that ensures quality in the applications we develop while guarding against feature creep/skew. But as our applications have become increasingly complex, traditional testing methods are not enough. Traditional testing only evaluates what we know, but complex systems often fail due to unknowns—the things that are almost impossible to test because we are unaware of them. Chaos Engineering is the exception that allows us to test for what we don’t know.

Self-service reliability with Internal Developer Platforms and Chaos Engineering

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Up until the early 2000s, developers and Ops (at the time IT) had separate and often competing objectives, separate department leadership, separate key performance indicators by which they were judged, and often worked on separate floors or even separate buildings.

Announcing the availability of Gremlin using AWS CloudFormation Public Registry

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. We’re excited to announce that Gremlin is available on AWS CloudFormation Public Registry.

Gremlin ALFI Demo - AWS RDS Unavailable - Chaos Engineering

In this demo, we'll share how you can use ALFI (Application Level Failure Injection) to make AWS RDS unavailable. This enables you to learn how your application handles different failure modes. We'll be using the ALFI Latency attack to perform this Chaos Engineering experiment.

Announcing the Gremlin Chaos Engineering Practitioner Certificate Program

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Chaos Engineering continues to grow in popularity and is rapidly becoming a job requirement. To help Engineering and Testing teams meet the need, we’re launching our first ever Gremlin Chaos Engineering Practitioner Certificate Program!