Latest Posts

gremlin

Failover Conf follow-up: Your team and culture questions answered!

Thank you all for joining us last week for Failover Conf 2! We had a great turnout this year, with over 1,800 participants, 20 sponsors, and 9 amazing sessions. After more than a year of virtual events and video calls, we know that Zoom fatigue is real. We tried to make this event different by finding new ways to bring the community together and thinking of fun new ways to shake up the conference formula.

gremlin

Announcing Services Discovery for tracking and improving service reliability

Gremlin helps teams proactively improve the reliability of their systems by running chaos experiments on infrastructure including hosts, containers, and Kubernetes clusters. But as microservice-based architectures and automated cloud platforms become the norm, engineers are shifting their focus from managing infrastructure to managing services. In order to keep these services as resilient as possible, they need tools that can help them find failure modes, reduce incidents, and improve availability.

gremlin

Announcing role based access control for API keys for more control over automation

Today, Gremlin is excited to announce the ability to create an API key that can perform actions with the same set of permissions as your user account. This allows you to automate Gremlin tasks safely and securely.

gremlin

Announcing our latest attacks to deal with meeting fatigue

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin. With everyone working remotely, video conference tools like Zoom have been a critical part of maintaining business continuity. It’s truly amazing that we can continue to work and connect with one another, even during a time where getting together in an office hasn’t been possible…

gremlin

Validating the resilience of your API gateway with Chaos Engineering

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. API gateways are a critical component of distributed systems and cloud-native deployments. They perform many important functions including request routing, caching, user authentication, rate limiting, and metrics collection. However, this means that any failures in your API gateway can put your entire deployment at risk.

gremlin

How to test for expired TLS/SSL certificates using Gremlin

Transport Layer Security (TLS), and its preceding protocol, Secure Sockets Layer (SSL), are essential components of the modern Internet. By encrypting network communications, TLS protects both users and organizations from publicly exposing their in-transit data to third parties. This is especially true for the web, where TLS is used to secure HTTP traffic (HTTPS) between backend servers and customers’ browsers.

gremlin

Tyler Wells on building a culture of reliability at Twilio

What does reliability look like at a company that has thousands of employees and provides critical communication services to over 150,000 customers? We talked with Tyler Wells, Senior Director of Engineering at Twilio, to learn how he and his team created a culture of reliability at Twilio. He talked in depth about his experiences developing reliability goals, building reliability practices, and aligning engineering teams on these objectives.