Operations | Monitoring | ITSM | DevOps | Cloud

December 2024

Gremlin's 2024 year-end Release Roundup

It’s been a busy year at Gremlin! We released two new experiments, added an entirely new onboarding process and features for AWS users, added a brand new Test Suite and Detected Risks, and made many UI improvements to our web app. We beefed up our agents with more enterprise capabilities, including support for large Kubernetes clusters and systems with over 64 CPUs, improved experiment behaviors, improved dependency detection, and per-team Private Network Integrations.

Why Gremlin: Today's complex applications need a different approach to reliability

Cloud-based distributed applications have changed how we need to approach reliability and resiliency. How do you make your applications reliable? Here’s Gremlin CEO Josh Leslie to tell you how. Today’s dynamic applications are too complex and constantly changing for humans to wrap their heads around. This means the reliability approaches that worked ten years ago simply won’t be enough. As a technology company (and these days, every company is a technology company), you need to take a different, programmatic approach to testing and improving the reliability of your applications.

Test for the common failures that cause 80% of outages with Gremlin

80% of failures at the infrastructure layer come from the same core gaps in reliability. Jeff Nickoloff, Gremlin Principal Engineer, goes over how Reliability Management test suites help improve reliability across your organization. Are you waiting for the other reliability shoe to drop and hoping that you actually fixed core resilience issues? Or do you know for sure that you’re resilient to common reliability issues?

Release Roundup November 2024: Reliability in the serverless and AI era

2024 is coming to a close, and while many teams are slowing down in preparation for the holidays, we’ve been cooking up tons of new features. We’ve extended our platform support to the Istio service mesh, added a brand new experiment type for testing artificial intelligence (AI) and large language model (LLM) workloads, and made it easier to onboard Kubernetes clusters. We’ve also made our Linux and Windows agents more robust and performant.

Now in private beta: Gremlin Service Mesh Extension

Service meshes like Istio have become an essential way to securely and reliably distribute network traffic, especially with ephemeral, service-based architectures such as Kubernetes. However, their constantly shifting nature can interfere with targeting specific services for resilience tests. Infrastructure-based testing is designed to target specific IP addresses, allowing precision testing of applications, VMs, and nodes.

Reliable AI models, simulations, and more with Gremlin's GPU experiment

Note This blog uses “GPU” to refer to the entire processing circuit, including the GPU processor, video memory, and other supporting hardware. ‍ Artificial Intelligence (AI) has become one of the biggest tech trends in years. From generating full movies to updating its own code, AI is performing tasks that were once science fiction.