Latest Posts

How to test AWS managed services with Gremlin

Aug 1, 2024 By Andre Newman In Gremlin

Note In this blog, we use “managed service providers” to refer to companies that provide hosted computing services, not managed IT service providers (MSPs). ‍ When was the last time you thought about the reliability of your cloud dependencies? The biggest challenge with using cloud platforms and SaaS services is also its biggest strength: the provider controls everything.

Read Post

Gremlin

Read more about How to test AWS managed services with Gremlin

How role-based access control (RBAC) works in Gremlin

Jul 25, 2024 By Andre Newman In Gremlin

Reliability testing and Chaos Engineering are essential for finding reliability risks and improving the resiliency of systems. Gremlin makes it easy to do so, but not every engineer needs access to the same experiments, systems, or services. That’s why we released customizable role-based access controls (RBAC), letting Gremlin customers control which actions your users can perform in Gremlin.

Read Post

Gremlin

Read more about How role-based access control (RBAC) works in Gremlin

Testing for expiring TLS and SSL certificates using Gremlin

Jul 16, 2024 By Andre Newman In Gremlin

Encryption is a fundamental part of nearly every modern application, whether you’re storing data, sending data to customers, or sharing data between backend services. Most organizations have a data encryption strategy, and nearly every web page is using HTTPS, thanks to initiatives like Let’s Encrypt. But setting up encryption isn’t a one-time initiative. Over time, the certificates backing modern encryption expire and need to be replaced.

Read Post

Gremlin

Read more about Testing for expiring TLS and SSL certificates using Gremlin

How to load-balance across multiple availability zones for improved redundancy

Jul 11, 2024 By Andre Newman In Gremlin

Load balancers are some of the most important load-bearing (pun intended) components in cloud environments. They perform multiple critical tasks: network switching, packet inspection, and of course, routing. Most cloud-based load balancers focus on load balancing within a single zone, but what if you have resources spread across multiple zones?

Read Post

Gremlin

Read more about How to load-balance across multiple availability zones for improved redundancy

Intelligent Health Checks: one-click observability for reliability tests

Jul 9, 2024 By Andre Newman In Gremlin

Reliability testing and observability are similar in one important way: engineering teams know they should be doing it, but they’re not sure how to start, or they don’t have the right resources, or they need to focus on competing priorities like feature development and incident response. In an ideal world, reliability and observability would be automated processes that configure, monitor, and run themselves.

Read Post

Gremlin

Read more about Intelligent Health Checks: one-click observability for reliability tests

What is the Well-Architected Cloud Test Suite?

Jul 5, 2024 By Gavin Cahill In Gremlin

When it comes to reliability, cloud providers use a Shared Responsibility Model. In essence, they’ll keep the infrastructure reliable, while you’re responsible for architecting reliability into your systems. To help make this easier, they’ve published a variety of best practice guides, such as the AWS Well-Architected Framework. These lengthy documents are filled with recommendations to help you architect a more secure, more reliable system.

Read Post

Gremlin

Read more about What is the Well-Architected Cloud Test Suite?

How to prevent accidental load balancer deletions

Jul 3, 2024 By Andre Newman In Gremlin

The worst thing you could do after successfully deploying to a new environment is to accidentally delete critical infrastructure. Unfortunately, that happened to one Google Cloud customer when their private cloud subscription was accidentally deleted, resulting in nearly two weeks of downtime. This isn’t an isolated problem either: Microsoft Azure had a similar problem when a typo inadvertently deleted an entire SQL Server instance rather than a specific database.

Read Post

Gremlin

Read more about How to prevent accidental load balancer deletions

Observability and incident response need resilience testing

Jun 28, 2024 By Gavin Cahill In Gremlin

There’s a reason why observability and incident response practices have become standard across modern software development. Anyone wanting to minimize downtime and deliver reliable, available applications needs to have fully instrumented systems and playbooks so they can respond quickly and effectively to outages or incidents. But there’s another piece to the reliability puzzle: resilience testing.

Read Post

Gremlin

Read more about Observability and incident response need resilience testing

Introducing Gremlin for AWS

Jun 20, 2024 By Ryan Detwiller In Gremlin

Today, Gremlin is introducing Gremlin for AWS, a suite of tools to more easily find and fix the reliability risks that cause downtime on AWS. The cloud opens up a range of reliability challenges that didn’t exist before, especially for customers running distributed, mission-critical workloads. Teams experience the pain of failed migrations, frequent incidents, and reliability toil, but often struggle to modernize their approach to reliability as they modernize their infrastructure.

Read Post

Gremlin

Read more about Introducing Gremlin for AWS

Strategies for migrating to Kubernetes

May 24, 2024 By Andre Newman In Gremlin

Migrating to a new platform can often feel like navigating a maze of technical challenges, especially when the platform is as complex as Kubernetes. Kubernetes has a vast number of features designed to help with deploying and managing large applications, but learning how to use it effectively can be just as challenging as‌ moving your workloads over. This doesn’t mean it’s impossible, of course, and there are several strategies for easing this process.

Read Post

Gremlin

Read more about Strategies for migrating to Kubernetes

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

How to test AWS managed services with Gremlin

How role-based access control (RBAC) works in Gremlin

Testing for expiring TLS and SSL certificates using Gremlin

How to load-balance across multiple availability zones for improved redundancy

Intelligent Health Checks: one-click observability for reliability tests

What is the Well-Architected Cloud Test Suite?

How to prevent accidental load balancer deletions

Observability and incident response need resilience testing

Introducing Gremlin for AWS

Strategies for migrating to Kubernetes

Monthly Archive

Follow Us