%term

From Chaos Engineering to Resilience Testing: Why We're Expanding How Teams Validate Reliability | Harness Blog

Feb 27, 2026 By Uma Mukkara In Harness

At Harness, we’re committed to helping teams build and deliver software that doesn’t just work – it thrives under pressure, scales reliably, and recovers swiftly from the unexpected. Today, we’re taking the next step in that mission by evolving our Chaos Engineering module into Resilience Testing. This evolution reflects how reliability is tested in practice today.

Read Post

Harness

Read more about From Chaos Engineering to Resilience Testing: Why We're Expanding How Teams Validate Reliability | Harness Blog

You need to regularly test your reliability

Feb 24, 2026 By Gremlin In Gremlin

Reliability testing isn’t a one-and-done thing. You need to test on a regular schedule to make sure your system is reliable in the face of changing systems.

View Video

Gremlin

Read more about You need to regularly test your reliability

Announcing Disaster Recovery Testing

Feb 3, 2026 By Andre Newman In Gremlin

Today, we’re launching a new approach to running disaster recovery tests, validating failover processes, and ensuring compliance with regulations such as DORA. With Disaster Recovery Testing, you can run zone, region, and datacenter-scale experiments across your entire Gremlin organization simultaneously. ‍

Read Post

Gremlin

Read more about Announcing Disaster Recovery Testing

Disaster Recovery Testing by Gremlin

Feb 3, 2026 By Gremlin In Gremlin

Do you know how your system will respond when major outages strike? Disaster Recovery Testing safely simulates real catastrophic failures across your entire system. You can centrally and easily run zone, region, and datacenter-scale reliability tests across your entire organization simultaneously for disaster recovery, business continuity, compliance verification, and more. With Disaster Recovery Testing, tests that used to take engineering-months and dozens of experts can be done safely and securely in hours by a single person.

View Video

Gremlin

Read more about Disaster Recovery Testing by Gremlin

AI has to be auditable to be reliable

Jan 28, 2026 By Gremlin In Gremlin

In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls talks about how companies will want to audit AI to keep it reliable.

View Video

Gremlin

Read more about AI has to be auditable to be reliable

AI reliability needs system reliability

Jan 22, 2026 By Gremlin In Gremlin

AI operates on the same systems and infrastructure as every application, which means if you want to keep it reliable, you have to keep the systems underneath it reliable. Gremlin CEO Kolton Andrus explains more in this clip from an AI reliability roundtable with @nobl9inc and @Pagerduty.

View Video

Gremlin

Read more about AI reliability needs system reliability

Reliability Resolutions: How to build effective reliability programs that won't fade away

Jan 21, 2026 By Gavin Cahill In Gremlin

Did you know the third week of January is the most common time for people to fail New Year’s Resolutions? It doesn’t matter whether it’s exercising more, learning a new language, or just trying to drink less coffee, that initial surge of fresh New Year’s energy is fading, and if you want to make a resolution stick, this is the key time to make a lasting change. The same is true with any reliability resolutions you might have made.

Read Post

Gremlin

Read more about Reliability Resolutions: How to build effective reliability programs that won't fade away

AI reliability requires different SLOs

Jan 16, 2026 By Gremlin In Gremlin

In this webinar clip, Alex Nauda, CTO of Nobl9, explains how keeping AI reliable means changing how you look at SLOs.

View Video

Gremlin

Read more about AI reliability requires different SLOs

We test our own critical dependencies

Jan 14, 2026 By Gremlin In Gremlin

Even if you know a dependency is critical, you still should test it. Otherwise, who knows what will happen if it goes down?

View Video

Gremlin

Read more about We test our own critical dependencies

Recommended Experiments for Production Resilience in Harness Chaos Engineering | Harness Blog

Jan 9, 2026 By Ashutosh Bhadauriya In Harness

This guide covers battle-tested chaos experiments for Kubernetes, AWS, Azure, and GCP to help you validate production resilience before real failures happen. Start with low blast radius experiments (pod-level) and gradually progress to higher impact scenarios (node/zone failures), always defining clear hypotheses and using probes to measure results. Building reliable distributed systems isn't just about writing good code. It's about understanding how your systems behave when things go wrong.

Read Post