%term

Spend a little time on software reliability now instead of a lot of time later

Jul 11, 2024 By Gremlin In Gremlin

You're going to spend time fixing reliability—but it's your choice whether it's during an outage or ahead of time on your schedule and for less costs. Which will you choose? "We all know when things go wrong, it cost us a million dollars and it was really bad. Let's have that never happen again. But when we say, I need every engineering team to spend one hour, one day a week on reliability, does everyone lose their mind, or is that a reasonable request? Can we amortize out the cost of that?

View Video

Gremlin

Read more about Spend a little time on software reliability now instead of a lot of time later

How to run fault injection tests on AWS managed services

Jul 11, 2024 By Gremlin In Gremlin

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Fully-managed SaaS services offer incredible scalability and accessibility, but at a cost: they’re also single points of failure. If your application depends on a SaaS service and the service fails, guess who your customers will blame? We need to design applications to anticipate and work around managed service failures, but how do we do that without having to wait for the service to fail?

View Video

Gremlin

Read more about How to run fault injection tests on AWS managed services

How to load-balance across multiple availability zones for improved redundancy

Jul 11, 2024 By Andre Newman In Gremlin

Load balancers are some of the most important load-bearing (pun intended) components in cloud environments. They perform multiple critical tasks: network switching, packet inspection, and of course, routing. Most cloud-based load balancers focus on load balancing within a single zone, but what if you have resources spread across multiple zones?

Read Post

Gremlin

Read more about How to load-balance across multiple availability zones for improved redundancy

Gremlin's API makes it easy to integrate testing in your CI/CD pipeline

Jul 9, 2024 By Gremlin In Gremlin

Thinking about integrating Gremlin into your existing pipeline? Look no farther than the Gremlin API. "The next step then was to build the right tooling such that the resiliency tests can be run from a pipeline. Gremlin's API first approach made it possible to do this in a very easy manner because everything that we could do from the UI and manually, we could replicate all of that through the API as well.

View Video

Gremlin

Read more about Gremlin's API makes it easy to integrate testing in your CI/CD pipeline

Intelligent Health Checks: one-click observability for reliability tests

Jul 9, 2024 By Andre Newman In Gremlin

Reliability testing and observability are similar in one important way: engineering teams know they should be doing it, but they’re not sure how to start, or they don’t have the right resources, or they need to focus on competing priorities like feature development and incident response. In an ideal world, reliability and observability would be automated processes that configure, monitor, and run themselves.

Read Post

Gremlin

Read more about Intelligent Health Checks: one-click observability for reliability tests

What is the Well-Architected Cloud Test Suite?

Jul 5, 2024 By Gavin Cahill In Gremlin

When it comes to reliability, cloud providers use a Shared Responsibility Model. In essence, they’ll keep the infrastructure reliable, while you’re responsible for architecting reliability into your systems. To help make this easier, they’ve published a variety of best practice guides, such as the AWS Well-Architected Framework. These lengthy documents are filled with recommendations to help you architect a more secure, more reliable system.

Read Post

Gremlin

Read more about What is the Well-Architected Cloud Test Suite?

How to prevent accidental load balancer deletions

Jul 3, 2024 By Andre Newman In Gremlin

The worst thing you could do after successfully deploying to a new environment is to accidentally delete critical infrastructure. Unfortunately, that happened to one Google Cloud customer when their private cloud subscription was accidentally deleted, resulting in nearly two weeks of downtime. This isn’t an isolated problem either: Microsoft Azure had a similar problem when a typo inadvertently deleted an entire SQL Server instance rather than a specific database.

Read Post

Gremlin

Read more about How to prevent accidental load balancer deletions

Observability and incident response need resilience testing

Jun 28, 2024 By Gavin Cahill In Gremlin

There’s a reason why observability and incident response practices have become standard across modern software development. Anyone wanting to minimize downtime and deliver reliable, available applications needs to have fully instrumented systems and playbooks so they can respond quickly and effectively to outages or incidents. But there’s another piece to the reliability puzzle: resilience testing.

Read Post

Gremlin

Read more about Observability and incident response need resilience testing

Reward engineers who fix problems before they cause outages

Jun 27, 2024 By Gremlin In Gremlin

Are you recognizing the good work engineers do to prevent outages? "The people that are out there doing good work to prevent fires from ever occurring, we're not often recognizing them. We're not often rewarding them. And once things go wrong, someone comes in and fixes it. That's great. That's needed. But we're rewarding that behavior. And so it becomes a bit of people are motivated by what behavior you reward.

View Video

Gremlin

Read more about Reward engineers who fix problems before they cause outages

Use the Gremlin API to add Chaos Engineering to your pipeline

Jun 25, 2024 By Gremlin In Gremlin

Did you know you can use the Gremlin API to integrate resiliency tests into your CI/CD pipeline? Our partner Nagarro has even made it part of their shift left package. "What we do is shift left and add a chaos stage to the pipeline. We have created the shift left accelerator package. It integrates with load tests and Gremlin APIs to set up the test scenario.

View Video

Gremlin

Read more about Use the Gremlin API to add Chaos Engineering to your pipeline

Operations | Monitoring | ITSM | DevOps | Cloud

Spend a little time on software reliability now instead of a lot of time later

How to run fault injection tests on AWS managed services

How to load-balance across multiple availability zones for improved redundancy

Gremlin's API makes it easy to integrate testing in your CI/CD pipeline

Intelligent Health Checks: one-click observability for reliability tests

What is the Well-Architected Cloud Test Suite?

How to prevent accidental load balancer deletions

Observability and incident response need resilience testing

Reward engineers who fix problems before they cause outages

Use the Gremlin API to add Chaos Engineering to your pipeline

Monthly Archive

Follow Us