Operations | Monitoring | ITSM | DevOps | Cloud

Reliability Resolutions: How to build effective reliability programs that won't fade away

Did you know the third week of January is the most common time for people to fail New Year’s Resolutions? It doesn’t matter whether it’s exercising more, learning a new language, or just trying to drink less coffee, that initial surge of fresh New Year’s energy is fading, and if you want to make a resolution stick, this is the key time to make a lasting change. The same is true with any reliability resolutions you might have made.

How to test application resiliency by simulating the Cloudflare December 2025 outage

This fall and winter have had their share of major outages (including AWS, Azure, and Cloudflare), and December was no exception. On December 5, 2025, Cloudflare suffered a 25-minute outage that served responses with HTTP 500 errors to about 28% of HTTP traffic served by Cloudflare. Since Cloudflare handles an average of 81 million HTTP requests per second, this represents a substantial chunk of internet traffic, including LinkedIn, Zoom, and Downdetector.

Release Roundup 2025: Reliability across AI, on-prem, and applications

2025 was a stark reminder of why reliability is so critical in the tech sector. The year wrapped up with multiple high-profile outages across several major cloud providers, costing companies around the world billions of dollars. Building resilient systems has never been more of a priority, especially as we move into the era of agentic AI.