Monthly Archive

Observability and incident response need resilience testing

Jun 28, 2024 By Gavin Cahill In Gremlin

There’s a reason why observability and incident response practices have become standard across modern software development. Anyone wanting to minimize downtime and deliver reliable, available applications needs to have fully instrumented systems and playbooks so they can respond quickly and effectively to outages or incidents. But there’s another piece to the reliability puzzle: resilience testing.

Read Post

Gremlin

Read more about Observability and incident response need resilience testing

Reward engineers who fix problems before they cause outages

Jun 27, 2024 By Gremlin In Gremlin

Are you recognizing the good work engineers do to prevent outages? "The people that are out there doing good work to prevent fires from ever occurring, we're not often recognizing them. We're not often rewarding them. And once things go wrong, someone comes in and fixes it. That's great. That's needed. But we're rewarding that behavior. And so it becomes a bit of people are motivated by what behavior you reward.

View Video

Gremlin

Read more about Reward engineers who fix problems before they cause outages

Use the Gremlin API to add Chaos Engineering to your pipeline

Jun 25, 2024 By Gremlin In Gremlin

Did you know you can use the Gremlin API to integrate resiliency tests into your CI/CD pipeline? Our partner Nagarro has even made it part of their shift left package. "What we do is shift left and add a chaos stage to the pipeline. We have created the shift left accelerator package. It integrates with load tests and Gremlin APIs to set up the test scenario.

View Video

Gremlin

Read more about Use the Gremlin API to add Chaos Engineering to your pipeline

Gremlin for AWS: Demo from Install to Testing

Jun 20, 2024 By Gremlin In Gremlin

Gremlin for AWS is a suite of tools to more easily find and fix the reliability risks that cause downtime on AWS. The cloud opens up a range of reliability challenges that didn’t exist before, especially for customers running distributed, mission-critical workloads. Teams experience the pain of failed migrations, frequent incidents, and reliability toil, but often struggle to modernize their approach to reliability as they modernize their infrastructure. That’s where Gremlin for AWS can help.

View Video

Gremlin

Read more about Gremlin for AWS: Demo from Install to Testing

Introducing Gremlin for AWS

Jun 20, 2024 By Gremlin In Gremlin

Introducing Gremlin for AWS, a suite of tools to more easily find and fix the reliability risks that cause downtime on AWS. Gremlin for AWS helps teams prevent incidents, monitor and test systems for known causes of failure, and gain visibility into the reliability posture of their applications—with 90% less effort.

View Video

Gremlin

Read more about Introducing Gremlin for AWS

Want more software reliability? It starts with leadership

Jun 20, 2024 By Gremlin In Gremlin

If you want to improve reliability, it has to be important from the top down. "As part of the CTO or leadership owning it, they need to tell folks that it's important in the product roadmap, in some of the development schedule, that we spend time on it, that the CEO is the person that holds people accountable, that they review the metrics, that they sit in the outages, that they understand the quality of the software.

View Video

Gremlin

Read more about Want more software reliability? It starts with leadership

Introducing Gremlin for AWS

Jun 20, 2024 By Ryan Detwiller In Gremlin

Today, Gremlin is introducing Gremlin for AWS, a suite of tools to more easily find and fix the reliability risks that cause downtime on AWS. The cloud opens up a range of reliability challenges that didn’t exist before, especially for customers running distributed, mission-critical workloads. Teams experience the pain of failed migrations, frequent incidents, and reliability toil, but often struggle to modernize their approach to reliability as they modernize their infrastructure.

Read Post

Gremlin

Read more about Introducing Gremlin for AWS

Don't measure reliability with a lagging indicator like downtime or MTTR

Jun 13, 2024 By Gremlin In Gremlin

Your reliability measurement can't just be a lagging indicator. "How do you know your company is doing well at reliability? A lot of people will just look at how many outages have you had in the last year and how much customer pain have you caused? I think that's one side of the coin. That's the reactive lagging indicator of the health of our system. To really be good at this, we need a way to understand the risks and the sharp points so that we have an idea of what we're getting into.

View Video

Gremlin

Read more about Don't measure reliability with a lagging indicator like downtime or MTTR

Office Hours: How to test zone redundancy using Gremlin

Jun 13, 2024 By Gremlin In Gremlin

•Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Zone failures are rare, but they still happen. When an entire zone fails, many of the most common redundancy techniques fail. How do you avoid outages like these, especially if they affect an entire datacenter?

View Video

Gremlin

Read more about Office Hours: How to test zone redundancy using Gremlin

Reliability should be about empowering teams to make more resilient software

Jun 11, 2024 By Gremlin In Gremlin

Check out how a customer integrated standardize testing into their CI/CD pipeline with minimal lift from individual teams.

View Video

Gremlin

Read more about Reliability should be about empowering teams to make more resilient software

Reliability is more important than ever-are you ready?

Jun 6, 2024 By Gremlin In Gremlin

Reliability and resiliency are getting more and more important. Is your organization ready? "Our digital infrastructure is going to be almost as important as our physical infrastructure. And when it fails, it's going to be a big deal. Like when a huge bank has a multi-day outage, when it impacts travel, safety, military, finance, government, those things are going to be much more important than they have been in the past.

View Video

Gremlin

Read more about Reliability is more important than ever-are you ready?

Operations | Monitoring | ITSM | DevOps | Cloud

Observability and incident response need resilience testing

Reward engineers who fix problems before they cause outages

Use the Gremlin API to add Chaos Engineering to your pipeline

Gremlin for AWS: Demo from Install to Testing

Introducing Gremlin for AWS

Want more software reliability? It starts with leadership

Introducing Gremlin for AWS

Don't measure reliability with a lagging indicator like downtime or MTTR

Office Hours: How to test zone redundancy using Gremlin

Reliability should be about empowering teams to make more resilient software

Reliability is more important than ever-are you ready?

Monthly Archive

Follow Us