%term

Best Practices for Testing Zone Redundancy

Oct 16, 2024 By Sam Rossoff In Gremlin

The way the story goes is that in the old days Amazon used to cut power to data centers so they could see if their services were actually redundant across different data centers; and that they only abandoned this practice when EC2 customers started to complain (no matter how many times they were warned their instances might disappear without notice). This story may be apocryphal, but you don’t need to be worried about power loss outages in order to have a given data center go down.

Read Post

Gremlin

Read more about Best Practices for Testing Zone Redundancy

Top 7 Kubernetes Chaos Engineering Tools

Oct 10, 2024 By Ken Ahrens In Speedscale

Developing highly resilient Kubernetes deployments is crucial for ensuring that your hosted applications in Kubernetes can effectively manage and recover from disruptions. This capability is vital in order to maintain continuous availability for your customers. The importance of resilience in your distributed system also escalates depending on your customer base and the critical nature of your application. Even brief periods of downtime can have a significant negative impact on your business.

Read Post

Speedscale

Read more about Top 7 Kubernetes Chaos Engineering Tools

Office Hours: How to test serverless applications using Failure Flags

Oct 10, 2024 By Gremlin In Gremlin

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Serverless applications are ideal for deploying scalable applications without having to manage infrastructure. However, this also makes it difficult to test their reliability. It’s easy to simulate a network outage or latency when you have direct access to the host that your software’s running on. What do you do when you only have control over the code?

View Video

Gremlin

Read more about Office Hours: How to test serverless applications using Failure Flags

How Visa Cross Border Solutions Reduces Outages by Testing System Resilience in Their SDLC

Oct 7, 2024 By Gremlin In Gremlin

For global financial services companies, reliability must be built-in and validated before and after shipping to production. Resilience testing is crucial for verifying the reliability of your applications under real-world conditions. But ad-hoc testing and exploratory experiments aren't sufficient: you need to run automated, standardized tests at global scale.

View Video

Gremlin

Read more about How Visa Cross Border Solutions Reduces Outages by Testing System Resilience in Their SDLC

Interpreting your reliability test results

Sep 19, 2024 By Andre Newman In Gremlin

Gremlin’s default suite of reliability tests analyzes critical functions of modern services: scalability, redundancy, and resilience to dependency failures. Services that pass this suite of tests can be trusted to remain available during unexpected incidents. But what happens when a service fails a test? How do you take failed test results and turn them into actionable insights? This blog aims to answer that question.

Read Post

Gremlin

Read more about Interpreting your reliability test results

What's Chaos Monkey? Its Role in Modern Testing

Sep 17, 2024 By Muhammad Raza In Splunk

Chaos Monkey is an open-source tool. Its primary use is to check system reliability against random instance failures. Chaos Monkey follows the testing concept of chaos engineering, which prepares networked systems for resilience against random and unpredictable chaotic conditions. Let’s take a deeper look.

Read Post

Splunk

Read more about What's Chaos Monkey? Its Role in Modern Testing

Office Hours: Get better reliability on AWS with our new release

Sep 12, 2024 By Gremlin In Gremlin

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Cloud platforms make it easier than ever to deploy massively scalable, distributed workloads, but this is a double-edged sword. There are reliability challenges unique to the cloud that didn’t exist before. Failed migrations, recurring incidents, and reliability toil take their toll.

View Video

Gremlin

Read more about Office Hours: Get better reliability on AWS with our new release

Release Roundup August 2024

Sep 9, 2024 By Andre Newman In Gremlin

Over the past year, the Gremlin team has focused on giving you more tools to adapt Gremlin to your organization’s reliability needs. We started with customizable reliability tests, and now, we’ve released customizable role-based access controls (RBAC). We’ve also made it easier to target specific availability zones when running Failure Flags experiments, and to run experiments behind a proxy. Keep reading to learn more! ‍

Read Post

Gremlin

Read more about Release Roundup August 2024

Reliability recommendations when adopting Kubernetes

Sep 3, 2024 By Andre Newman In Gremlin

Kubernetes just celebrated its tenth birthday. That’s 10 years of microservices, containers, service meshes, and many other paradigms that are now common to many developers’ toolkits.

Read Post

Gremlin

Read more about Reliability recommendations when adopting Kubernetes

How to verify, document, and prove compliance with Gremlin

Aug 29, 2024 By Gavin Cahill In Gremlin

Resilient and reliable IT systems have become a minimum requirement for modern businesses—a fact driven home by any number of high-profile outages over the past few years. Unfortunately, when those outages are in the financial sector, it can have far-reaching and incredibly damaging results.

Read Post