Gremlin

How to find and test critical dependencies with Gremlin

Feb 13, 2025 By Gremlin In Gremlin

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Pop quiz - what are all of the dependencies your services rely on? If you’re like most engineers, you probably struggled to come up with the answer. Modern applications are complex and rely on dozens (if not hundreds) of dependencies. Many teams rely on spreadsheets, but manual processes like these break down over time. What if you had a tool that found and tracked dependencies for you?

View Video

Gremlin

Read more about How to find and test critical dependencies with Gremlin

Announcing Gremlin Private Edition

Feb 11, 2025 By Andre Newman In Gremlin

Today, we’re excited to announce Gremlin Private Edition, a version of Gremlin that you can run entirely within your own private network.

Read Post

Gremlin

Read more about Announcing Gremlin Private Edition

How the Gremlin agent fails safely

Jan 30, 2025 By Andre Newman In Gremlin

Testing shouldn’t feel risky. While it might sound counterintuitive, certain types of testing can actually increase risks to your systems. Load testing, for example, is a great way to see how your systems behave under pressure, but it can also cause those same systems to fail if they aren’t equipped to handle the load. For some types of testing, this is necessary, as is the case with reliability testing and Chaos Engineering.

Read Post

Gremlin

Read more about How the Gremlin agent fails safely

How to fix the root cause of a failed reliability test

Jan 21, 2025 By Andre Newman In Gremlin

You’re well on your way to becoming more reliable. You’ve added your services, found and fixed some Detected Risks, and run your first set of reliability tests. However, some of your tests returned as “Failed.” Not to worry: this isn’t a reflection of you or your engineering skills but rather an opportunity to learn more about how your systems work and, more importantly, how to make them more resilient.

Read Post

Gremlin

Read more about How to fix the root cause of a failed reliability test

What's the ROI of reliability?

Jan 13, 2025 By Gavin Cahill In Gremlin

Reliability doesn’t happen by itself. Making a system reliable and resilient enough that your customers can count on it takes a combination of time, effort, and resources that could be used elsewhere, such as shipping new features. It’s also not optional. In an era where downtime costs an average of $14,056/min (or $843,360/hr), outages have a material impact on businesses. Unfortunately, most systems are sprawling and complex enough that even small amounts of downtime can add up quickly.

Read Post

Gremlin

Read more about What's the ROI of reliability?

Maximizing your reliability on AWS

Jan 13, 2025 By Andre Newman In Gremlin

Cloud providers like AWS excel at creating reliable platforms for developers to build on. But while the platforms may be rock-solid, this doesn’t guarantee your applications will be too. It’s the provider’s job to offer stable infrastructure, but you’re still on the hook for making your workloads resilient, recoverable, and fault-tolerant. There’s only one problem: cloud platforms are essentially black boxes.

Read Post

Gremlin

Read more about Maximizing your reliability on AWS

Manage your reliability work more easily with Gremlin's newest features

Jan 6, 2025 By Andre Newman In Gremlin

Reliability testing is ongoing work, and tracking that work can be difficult in large organizations. Engineers run one-off experiments, scheduled Scenarios run in the background, and, for more mature teams, CI/CD workflows fire off automated tests on demand. According to our own product metrics, teams run an average of 200 to 500 tests each day! With so much happening, it’s hard to keep track of everything going on in Gremlin—until now.

Read Post

Gremlin

Read more about Manage your reliability work more easily with Gremlin's newest features

Gremlin's 2024 year-end Release Roundup

Dec 18, 2024 By Andre Newman In Gremlin

It’s been a busy year at Gremlin! We released two new experiments, added an entirely new onboarding process and features for AWS users, added a brand new Test Suite and Detected Risks, and made many UI improvements to our web app. We beefed up our agents with more enterprise capabilities, including support for large Kubernetes clusters and systems with over 64 CPUs, improved experiment behaviors, improved dependency detection, and per-team Private Network Integrations.

Read Post

Gremlin

Read more about Gremlin's 2024 year-end Release Roundup

Why Gremlin: Today's complex applications need a different approach to reliability

Dec 16, 2024 By Gremlin In Gremlin

Cloud-based distributed applications have changed how we need to approach reliability and resiliency. How do you make your applications reliable? Here’s Gremlin CEO Josh Leslie to tell you how. Today’s dynamic applications are too complex and constantly changing for humans to wrap their heads around. This means the reliability approaches that worked ten years ago simply won’t be enough. As a technology company (and these days, every company is a technology company), you need to take a different, programmatic approach to testing and improving the reliability of your applications.

View Video

Gremlin

Read more about Why Gremlin: Today's complex applications need a different approach to reliability

Test for the common failures that cause 80% of outages with Gremlin

Dec 16, 2024 By Gremlin In Gremlin

80% of failures at the infrastructure layer come from the same core gaps in reliability. Jeff Nickoloff, Gremlin Principal Engineer, goes over how Reliability Management test suites help improve reliability across your organization. Are you waiting for the other reliability shoe to drop and hoping that you actually fixed core resilience issues? Or do you know for sure that you’re resilient to common reliability issues?

View Video

Gremlin

Read more about Test for the common failures that cause 80% of outages with Gremlin

Operations | Monitoring | ITSM | DevOps | Cloud

Gremlin

How to find and test critical dependencies with Gremlin

Announcing Gremlin Private Edition

How the Gremlin agent fails safely

How to fix the root cause of a failed reliability test

What's the ROI of reliability?

Maximizing your reliability on AWS

Manage your reliability work more easily with Gremlin's newest features

Gremlin's 2024 year-end Release Roundup

Why Gremlin: Today's complex applications need a different approach to reliability

Test for the common failures that cause 80% of outages with Gremlin

Monthly Archive

Follow Us