Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

The High Price of Internet Disruptions: New Study Reveals the Financial Impact on eCommerce Companies

Internet disruptions can be a real headache for any organization, but for eCommerce companies in particular, they’re proving to be a lot more than just an inconvenience. A new study by Forrester Consulting is bound to send shockwaves through the industry by quantifying the actual cost of Internet disruptions. Spoiler alert: it’s higher than you think.

Mythbusting IPv6 with Jan Zorz

IPv6 was developed in the late 1990s as a successor to IPv4 in response to widespread concerns about the growth of the Internet and its potential impact on the existing IPv4 address protocol, in particular potential address exhaustion. It was assumed that after some time as a dual-stack solution, we would phase out IPv4 entirely. Almost twenty-five years later, however, we are approaching full-scale depletion of IPv4 addresses, in part because the adoption of IPv6 is still lagging.

From Dial-Up to the Cloud: Why APM is Not Enough in the Age of the Internet

What would you be doing right now if the Internet didn't exist? The world wide web as we know it is only a few decades old, but it's hard to imagine life without it. I fondly recall the early days of the "personal" Internet, when I used a 56k modem and waited anxiously for that oh-so-familiar connecting sound to access my AOL account and check if I had mail. We've come a long way from those humble beginnings.

The SRE Report 2023: Forecasts and the Current Economy

As questions and challenges loom over the tech industry and the larger economy, now is a perfect time for us to take a step back and learn from the past. As reliability engineers, we regularly use Service Level Objectives (SLOs) to understand the performance, reliability, and trends of our systems to help inform and prioritize our decision making.

Preventing Outages in 2023

The outages span the giants of the Internet and some of the biggest failures of IT resilience we were subject to – from AWS’s trifecta of outages in December 2021 to the October ‘21 outage that took down Facebook, Instagram, WhatsApp, and interrelated services. We also look at some more intermittent outages that you may have missed.

SRE Report 2023: Findings From the Field - Toil

Toil. Few other words have the same visceral impact for SREs as their four-letter nemesis: toil. Although pretty much everyone recognizes and agrees that toil is bad, it is a term that is frequently misused in colloquial use. In common English usage, toil is defined as “long strenuous fatiguing labor”. As a term of art in the SRE profession, “toil” has several very specific characteristics which distinguish it from other sorts of work which people spend time on.

'Preventing Outages in 2023: What We Can Learn from Recent Failures' Provides Analysis of Internet Failures and Key Learnings

New white paper from Catchpoint provides in-depth analysis of key Internet outages across the past 18 months, from AWS to Facebook; includes six critical lessons for IT teams to improve Internet Resilience.

Microsoft Cloud Outage Causes Global Workforce Disruptions

Many of us (indeed 1 billion plus users worldwide) rely on Microsoft for essential work activities and were impacted yesterday (Wednesday January 25, 2023) when the cloud service provider experienced a prolonged outage. Internet Resilience is a business priority because when critical workforce services like Microsoft go down, global teams are hugely disrupted.