Operations | Monitoring | ITSM | DevOps | Cloud

Catchpoint

The SRE Report 2025: Highlighting Critical Trends in Site Reliability Engineering

Catchpoint's annual report reveals the rise of operational toil, the growing importance of user experience as a reliability metric, and the challenges of balancing speed and stability in a rapidly developing AI-driven landscape.

The SRE Report 2025's Call to Action

The SRE Report is now seven years old. I’ve had the honor and privilege of authoring it for the last five years. This 2025 version included working with some amazing individuals like Kurt Andersen and Denton Chikura. My heartfelt thanks go to them for shouldering the weight of what is both a labor of love and an often daunting, procrastination-inducing marathon of analysis.

Monitoring in the Age of the Internet: DEM, IPM, and APM-What You Need to Know

Gartner recently published the first ever Magic Quadrant for Digital Experience Monitoring (DEM). This landmark report raises important questions about what DEM is and why we need a new category now. It also prompts discussions about how DEM, Internet Performance Monitoring (IPM), and Application Performance Monitoring (APM) relate to each other and what roles they play in modern monitoring strategies.

2024: A banner year for Internet Resilience

This was an important year for our industry. On one side, digital transformation efforts continue to make almost every business process and almost every human process digital, which means dependent on the internet. Almost every application, every system is cloud-centric, service-oriented, and composed of multiple geographically dispersed services. The Internet is fragile, complex, and constantly changing.

SSL Monitoring, Trust, and McLOVIN

The recent ServiceNow Secure Sockets Layer (SSL) certificate error disrupted operations for hundreds of organizations causing widespread connectivity failures. IT operations stalled, developers hit roadblocks, and businesses across industries felt the impact. The culprit? An expired SSL certificate. While these disruptions highlight the importance of SSL monitoring, they point to a deeper issue: trust.

Performing for the holidays: Look beyond uptime for season sales success

With the holiday shopping season in full swing, poor web performance can have a big impact on revenue. There’s intense competition for online shoppers, and customers will quickly bounce to another site instead of slogging through a bad experience. The best way to track and achieve your web performance goals is through experience-based SLOs (Experience Level Objectives, or XLOs).

Catch frustration before it costs you: New tools for a better user experience

Imagine you're on a website trying to purchase a product, but every time you click the "Add to Cart" button, nothing happens. Frustrating, isn’t it? Such moments can deter consumers from completing their online purchases. And while users find this annoying, it poses an even bigger challenge for businesses.

Lessons from Microsoft's office 365 Outage: The Importance of third-party monitoring

When your software powers productivity for millions of users, trust becomes your ultimate currency. Trust is earned through transparency, clear communication, and unwavering reliability—especially when disruptions occur. Microsoft learned this lesson recently during a significant outage that took down two of its flagship services: Outlook and Teams.