Gremlin

San Jose, CA, USA
2016
Jul 27, 2021   |  By Jason Yee
Break Things on Purpose is a podcast for all-things Chaos Engineering. In this episode of the Break Things on Purpose podcast, we speak with Paul Marsicovetere, Senior Cloud Infrastructure Engineer at Formidable.
Jul 13, 2021   |  By Jason Yee
In this episode of the Break Things on Purpose podcast, we speak with Taylor Dolezal, Senior Developer Advocate at HashiCorp.
Jun 30, 2021   |  By Andre Newman
Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Up until the early 2000s, developers and Ops (at the time IT) had separate and often competing objectives, separate department leadership, separate key performance indicators by which they were judged, and often worked on separate floors or even separate buildings.
Jun 29, 2021   |  By Jason Yee
In this episode of the Break Things on Purpose podcast, we ask our guests for their strong opinions.
Jun 21, 2021   |  By Andre Newman
Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. We’re excited to announce that Gremlin is available on AWS CloudFormation Public Registry.
Jun 15, 2021   |  By Jason Yee
In this episode of the Break Things on Purpose podcast, we celebrate Terraform's 1.0 release with Taylor Dolezal.
Jun 8, 2021   |  By Tammy Butow
Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Chaos Engineering continues to grow in popularity and is rapidly becoming a job requirement. To help Engineering and Testing teams meet the need, we’re launching our first ever Gremlin Chaos Engineering Practitioner Certificate Program!
Jun 7, 2021   |  By Gremlin
Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Updated June 7, 2021
May 18, 2021   |  By Jason Yee
Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Break Things on Purpose is a podcast for all-things Chaos Engineering. Check out our latest episode below. You can subscribe to Break Things on Purpose wherever you get your podcasts. If you have feedback about the show, find us on Twitter at @BTOPpod or shoot us a note at podcast@gremlin.com!
May 4, 2021   |  By James Thigpen
Thank you all for joining us last week for Failover Conf 2! We had a great turnout this year, with over 1,800 participants, 20 sponsors, and 9 amazing sessions. After more than a year of virtual events and video calls, we know that Zoom fatigue is real. We tried to make this event different by finding new ways to bring the community together and thinking of fun new ways to shake up the conference formula.
Jul 21, 2021   |  By Gremlin
Chaos Engineering in 60 Seconds - Kubernetes Blackhole Attack.
Jul 21, 2021   |  By Gremlin
Chaos Engineering in 60 Seconds - Kubernetes Latency Attack
Jul 21, 2021   |  By Gremlin
Chaos Engineering in 60 Seconds - Process Killer Attack
Jul 20, 2021   |  By Gremlin
Black swan events are inherently unpredictable—you can’t prepare for every possible threat. Instead, you must identify the ways systems can fail and develop strategies to restore them to full service when these failures happen. But a disaster recovery plan (DRP) can’t be relied on until it’s been proven to work. The use of Chaos Engineering allows you to test your DRP much more safely and predictably than you could otherwise.
Jul 20, 2021   |  By Gremlin
Today’s distributed, cloud-based environments are incredibly complex. Not only does each component depend on many others, but modern systems are also highly dynamic—changing frequently as teams push new code or make updates to infrastructure. Taming this complexity to ensure reliability requires end-to-end observability to understand how components depend on each other. Additionally, proactive Chaos Engineering combined with AI-driven observability lets you uncover “unknown unknowns” that impact how your system will respond to different failure scenarios.
Jul 20, 2021   |  By Gremlin
Test-driven development (TDD) is a process that ensures quality in the applications we develop while guarding against feature creep/skew. But as our applications have become increasingly complex, traditional testing methods are not enough. Traditional testing only evaluates what we know, but complex systems often fail due to unknowns—the things that are almost impossible to test because we are unaware of them. Chaos Engineering is the exception that allows us to test for what we don’t know.
Jun 9, 2021   |  By Gremlin
In this demo, we'll share how you can use ALFI (Application Level Failure Injection) to make AWS RDS unavailable. This enables you to learn how your application handles different failure modes. We'll be using the ALFI Latency attack to perform this Chaos Engineering experiment.
May 7, 2021   |  By Gremlin
In this presentation, Tammy shares important failure modes to consider when responsible for the reliability of Kubernetes in your organization.
Apr 29, 2021   |  By Gremlin
Matt Stratton, host of the Arrested DevOps podcast, will host Jeff Smith, Director of Production Operations at Centro and author of the book "Operations Anti-patterns, DevOps Solutions" for an engaging conversation about building reliable teams using DevOps principles.
Jul 25, 2020   |  By Gremlin
Learn the basics of Chaos Engineering: discover the tools, tests, and culture needed to create better software and prevent outages and downtime. This whitepaper provides a comprehensive introduction to the discipline of Chaos Engineering including why it is more needed than ever, how to get started, and best practices to maximize learnings and reduce risk.
Jul 25, 2020   |  By Gremlin
By following this guide, you'll successfully increase your organization's reliability with minimal effort and risk. This document will serve as your guide to implementing Chaos Engineering and Gremlin within your organization. From educating your team on the principles of Chaos Engineering to running automated experiments, this guide will walk through each stage of the adoption process in order to ensure a smooth and successful rollout.
Jul 25, 2020   |  By Gremlin
Amazon DynamoDB is fast, powerful, and intended for high availability. These are all valuable attributes in a data storage solution, but to be useful as advertised, it must be configured thoughtfully. Learn how to use Chaos Engineering to ensure DynamoDB performs the way you expect. In this guide, we cover: Amazon DynamoDB is one of the most popular NoSQL databases and is the data store of choice for many teams running production workloads in AWS.
Jul 1, 2020   |  By Gremlin
Win over and convince your coworkers and management to explore and adopt Chaos Engineering and Site Reliability Engineering (SRE). The playbook provides ideas and techniques that can be used to articulate the need and benefits to internal stakeholders in your organization. It also guides the initial implementation in a way that will lead to success and growth across the organization. Implementing something new like Chaos Engineering successfully is a good way to get promoted and help the organization succeed, and this guide is here to help you.
Jul 1, 2020   |  By Gremlin
MongoDB is designed for performance, scale, and high-availability. But, as with any software, you need to test your configuration to verify that it will work as advertised. Ensure that MongoDB performs the way you expect by using Chaos Engineering to test four key features. This guide includes four experiment tutorials to verify that MongoDB will perform reliably: In order to ensure you get the most out of MongoDB's rich features, including built-in data sharding and replication, it's crucial to test your configuration.

Gremlin aims to make the internet more reliable and prevent costly and reputation-damaging outages. Its failure-as-a-service platform empowers engineers to build more resilient systems through safe experimentation.

Downtime is expensive and can hurt your brand. Gremlin provides engineers with the framework to safely, securely, and easily simulate real outages with an ever-growing library of attacks. Turn failure into resilience with chaos engineering.

Build resilient infrastructure:

  • Resource Gremlins: Throttle CPU, Memory, I/O, and Disk.
  • State Gremlins: Reboot hosts, kill processes, travel in time.
  • Network Gremlins: Introduce latency, blackhole traffic, lose packets, fail DNS.

Test for application failure:

  • Test for failure in your code.
  • Fail or delay serverless functions.
  • Narrow the impact to a single user, device, or percentage of traffic.

Avoid downtime. Use Gremlin to turn failure into resilience.