%term

Creating an agentic feedback loop with reliability guardrails

Jun 25, 2026 By Gavin Cahill In Gremlin

Reliability guardrails help make sure that your applications stay reliable without slowing down. In an earlier blog, we went into why agentic AI development needs reliability guardrails. It went over how the increased speed of AI development demands automated guardrails to verify resilience and what kinds of tests these guardrails should cover. But that’s only the beginning. By themselves, guardrails act as a gate to ensure resilience mechanisms hold under rapid changes.

Read Post

Gremlin

Read more about Creating an agentic feedback loop with reliability guardrails

Why agentic AI development needs reliability guardrails

May 15, 2026 By Gavin Cahill In Gremlin

AI has massively accelerated code deployment. In fact, since the introduction of agentic coding, GitHub has seen exponential growth in PRs, commits, and new repos. What they originally predicted would require 10X capacity, they’re now estimating it’s going to require 30X capacity, and the biggest driver is agentic development. Companies across industries are building agentic pipelines to ship features faster than ever before. That acceleration isn’t without risk.

Read Post

Gremlin

Read more about Why agentic AI development needs reliability guardrails

The hidden reliability risks in your agentic AI workflows

Mar 17, 2026 By Andre Newman In Gremlin

Artificial intelligence recently took a major leap from “saying” to “doing.” Instead of simple back-and-forth chats, we’re now allowing automated AI processes to take action on our behalf—from responding to emails to building and deploying complete applications. This shift from “assistant” to “actor” can make applications more capable, but it also creates additional failure modes.

Read Post

Gremlin

Read more about The hidden reliability risks in your agentic AI workflows

Test your AI model training reliability, too

Mar 13, 2026 By Gremlin In Gremlin

Training is at the heart of every LLM model, but it’s still an application running on an infrastructure, which means it can fail. Our GPU test helps you test your training GPUs so you don’t lose that valuable work. TRANSCRIPT: One of the things we built recently was the GPU Gremlin. So if you are training a bunch of models and you're doing a bunch of GPU testing. You know, we want to give you the tools to be able to go test that, to understand how training the model could fail.

View Video

Gremlin

Read more about Test your AI model training reliability, too

How Gremlin makes disaster recovery testing easier and faster

Mar 4, 2026 By Gavin Cahill In Gremlin

There’s a common saying: “A backup isn’t a backup until you’ve tested it.” The same is true whether it’s a simple database failover or an entire data center/cloud provider failover. You simply won’t know if it works if you don’t test it. When it comes to disaster recovery testing, that can be an expensive, painful, and arduous process. But it’s required by companies for a reason. And not just for disasters like hurricanes, flooding, or earthquakes.

Read Post

Gremlin

Read more about How Gremlin makes disaster recovery testing easier and faster

You need to regularly test your reliability

Feb 24, 2026 By Gremlin In Gremlin

Reliability testing isn’t a one-and-done thing. You need to test on a regular schedule to make sure your system is reliable in the face of changing systems.

View Video

Gremlin

Read more about You need to regularly test your reliability

Announcing Disaster Recovery Testing

Feb 3, 2026 By Andre Newman In Gremlin

Today, we’re launching a new approach to running disaster recovery tests, validating failover processes, and ensuring compliance with regulations such as DORA. With Disaster Recovery Testing, you can run zone, region, and datacenter-scale experiments across your entire Gremlin organization simultaneously. ‍

Read Post

Gremlin

Read more about Announcing Disaster Recovery Testing

Disaster Recovery Testing by Gremlin

Feb 3, 2026 By Gremlin In Gremlin

Do you know how your system will respond when major outages strike? Disaster Recovery Testing safely simulates real catastrophic failures across your entire system. You can centrally and easily run zone, region, and datacenter-scale reliability tests across your entire organization simultaneously for disaster recovery, business continuity, compliance verification, and more. With Disaster Recovery Testing, tests that used to take engineering-months and dozens of experts can be done safely and securely in hours by a single person.

View Video

Gremlin

Read more about Disaster Recovery Testing by Gremlin

AI has to be auditable to be reliable

Jan 28, 2026 By Gremlin In Gremlin

In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls talks about how companies will want to audit AI to keep it reliable.

View Video

Gremlin

Read more about AI has to be auditable to be reliable

AI reliability needs system reliability

Jan 22, 2026 By Gremlin In Gremlin

AI operates on the same systems and infrastructure as every application, which means if you want to keep it reliable, you have to keep the systems underneath it reliable. Gremlin CEO Kolton Andrus explains more in this clip from an AI reliability roundtable with @nobl9inc and @Pagerduty.

View Video

Gremlin

Read more about AI reliability needs system reliability

Operations | Monitoring | ITSM | DevOps | Cloud

Creating an agentic feedback loop with reliability guardrails

Why agentic AI development needs reliability guardrails

The hidden reliability risks in your agentic AI workflows

Test your AI model training reliability, too

How Gremlin makes disaster recovery testing easier and faster

You need to regularly test your reliability

Announcing Disaster Recovery Testing

Disaster Recovery Testing by Gremlin

AI has to be auditable to be reliable

AI reliability needs system reliability

Monthly Archive

Follow Us