%term

Chaos Engineering Training: Network Experiments

Jan 6, 2026 By Harness In Harness

Learn how to test your application's resilience to network issues with Harness Chaos Engineering. This tutorial covers essential network chaos experiments. Discover how to simulate real-world network conditions to identify weaknesses and validate your system's behavior under degraded connectivity.

View Video

Harness

Read more about Chaos Engineering Training: Network Experiments

Chaos Engineering Training: Zonal, Regional Failures and SSL/TLS Certificates Expiration

Jan 6, 2026 By Harness In Harness

Learn how to test your system's resilience against critical infrastructure failures. This tutorial demonstrates how to simulate zonal and regional outages to validate your high availability setup, plus how to test SSL/TLS certificate expiration scenarios. Essential for ensuring your applications can handle real-world failure conditions and maintain uptime during certificate-related issues.

View Video

Harness

Read more about Chaos Engineering Training: Zonal, Regional Failures and SSL/TLS Certificates Expiration

Chaos Engineering Training: Chaos Hub, Experiment Templates, Import as Local Copy and Reference

Jan 6, 2026 By Harness In Harness

Learn how to leverage Chaos Hub in Harness Chaos Engineering to accelerate your resilience testing. This tutorial covers browsing the Chaos Hub for pre-built experiments, understanding experiment templates, and two key workflows: importing experiments as local copies for customization or referencing them directly from the hub. Perfect for teams looking to quickly implement chaos experiments without building from scratch.

View Video

Harness

Read more about Chaos Engineering Training: Chaos Hub, Experiment Templates, Import as Local Copy and Reference

Improving reliability starts with the 10 most common failures

Dec 19, 2025 By Gremlin In Gremlin

Failures will occur, but reliability testing helps us understand them instead of being surprised. Gremlin founder and CEO Kolton Andrus sat down with Stephen Townshend on the Slight Reliability podcast to talk about how!

View Video

Gremlin

Read more about Improving reliability starts with the 10 most common failures

How to test application resiliency by simulating the Cloudflare December 2025 outage

Dec 19, 2025 By Gavin Cahill In Gremlin

This fall and winter have had their share of major outages (including AWS, Azure, and Cloudflare), and December was no exception. On December 5, 2025, Cloudflare suffered a 25-minute outage that served responses with HTTP 500 errors to about 28% of HTTP traffic served by Cloudflare. Since Cloudflare handles an average of 81 million HTTP requests per second, this represents a substantial chunk of internet traffic, including LinkedIn, Zoom, and Downdetector.

Read Post

Gremlin

Read more about How to test application resiliency by simulating the Cloudflare December 2025 outage

AI is changing our reliability response teams

Dec 18, 2025 By Gremlin In Gremlin

In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls talks about how AI is bringing AI engineers into incident response teams.

View Video

Gremlin

Read more about AI is changing our reliability response teams

Release Roundup 2025: Reliability across AI, on-prem, and applications

Dec 15, 2025 By Andre Newman In Gremlin

2025 was a stark reminder of why reliability is so critical in the tech sector. The year wrapped up with multiple high-profile outages across several major cloud providers, costing companies around the world billions of dollars. Building resilient systems has never been more of a priority, especially as we move into the era of agentic AI.

Read Post

Gremlin

Read more about Release Roundup 2025: Reliability across AI, on-prem, and applications

Software failure will occur, but we can be ready

Dec 12, 2025 By Gremlin In Gremlin

Failures will occur, but reliability testing helps us understand them instead of being surprised. Gremlin founder and CEO Kolton Andrus sat down with Stephen Townshend on the Slight Reliability podcast to talk about how!

View Video

Gremlin

Read more about Software failure will occur, but we can be ready

How to use Gremlin's Reliability Report

Dec 12, 2025 By Gavin Cahill In Gremlin

Modern applications can easily include hundreds of discrete services, all of which need to be reliable in order for the application to function correctly. While running tests on a handful of critical services can lead to small reliability improvements, real impact requires testing and increased reliability visibility across your entire organization. That’s the logic behind the new, improved Reliability Reports within Gremlin.

Read Post