Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Announcing the availability of Gremlin using AWS CloudFormation Public Registry

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. We’re excited to announce that Gremlin is available on AWS CloudFormation Public Registry.

Announcing the Gremlin Chaos Engineering Practitioner Certificate Program

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Chaos Engineering continues to grow in popularity and is rapidly becoming a job requirement. To help Engineering and Testing teams meet the need, we’re launching our first ever Gremlin Chaos Engineering Practitioner Certificate Program!

Podcast: Break Things on Purpose | Jose Nino, Staff Software Engineer at Lyft

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Break Things on Purpose is a podcast for all-things Chaos Engineering. Check out our latest episode below. You can subscribe to Break Things on Purpose wherever you get your podcasts. If you have feedback about the show, find us on Twitter at @BTOPpod or shoot us a note at podcast@gremlin.com!

Failover Conf follow-up: Your team and culture questions answered!

Thank you all for joining us last week for Failover Conf 2! We had a great turnout this year, with over 1,800 participants, 20 sponsors, and 9 amazing sessions. After more than a year of virtual events and video calls, we know that Zoom fatigue is real. We tried to make this event different by finding new ways to bring the community together and thinking of fun new ways to shake up the conference formula.

Announcing Services Discovery for tracking and improving service reliability

Gremlin helps teams proactively improve the reliability of their systems by running chaos experiments on infrastructure including hosts, containers, and Kubernetes clusters. But as microservice-based architectures and automated cloud platforms become the norm, engineers are shifting their focus from managing infrastructure to managing services. In order to keep these services as resilient as possible, they need tools that can help them find failure modes, reduce incidents, and improve availability.

Announcing role based access control for API keys for more control over automation

Today, Gremlin is excited to announce the ability to create an API key that can perform actions with the same set of permissions as your user account. This allows you to automate Gremlin tasks safely and securely.

How Netflix Uses Fault Injection To Truly Understand Their Resilience

Distributed systems such as microservices have defined software engineering over the last decade. The majority of advancements have been in increasing resilience, flexibility, and rapidity of deployment at increasingly larger scales. For streaming giant Netflix, the migration to a complex cloud based microservices architecture would not have been possible without a revolutionary testing method known as fault injection. With tools like chaos monkey, Netflix employs a cutting edge testing toolkit.