Each year of the SRE Report, there’s a trend or anti-pattern that leaps out and makes us pause and reflect. Last year, for example, we found a huge drop in global toil levels. With the whole world working from home for a full year, it made sense that global toil levels would drop, right? But this year, despite the great reopening underway, toil levels dropped even further - it's a paradox, one which no doubt will require its own scrutiny.
Not all Internet outages take a website down. Some may impact a smaller subsection of users or only affect one part of a site’s functionality. Moreover, because of their relative “hidden” nature, organizations may not always know about them immediately since fewer users will be making complaints. However, such incidents can still have serious consequences, thus you want to detect them as soon as possible so you can quickly mitigate and resolve issues.
Dear Santa, I’ve been an extremely good IT Operations Manager this year (which is saying something considering the state of the world at the moment) and I have a few items on my wish list.
Most Internet-centric organizations today use some form of APM tools, as they should. But they are insufficient. Over the last ten years, the world has completely changed. If you think about it, in the first decade of this millennium, most businesses had an Exchange server, maybe Siebel CRM, a file share, and a range of other business apps, usually hosted in the same building. Everything was on the LAN. Today, it is the exact opposite. Everything is distributed.
We are at the cusp of an important technology transformation. A discontinuity in technology as Peter Drucker would call it (precipitated by Covid). For decades, IT organizations invested in building, managing, and monitoring LANs. Everything was on your local network: your CRM, your Exchange email, the file shares, and the print server. Today, many companies are shutting down their “old legacy network” and are running their enterprise without a LAN, WAN, or an OnPrem datacenter.
Metrics, metrics everywhere... a gauge here, a counter there... milliseconds, percentages... a list of variables running into pages... what is fast, what is slow...? how on earth is one to know...? Today we have all manner of variables around us, of differing gravity, that each have their own individual purpose in the measurement of web performance. Some of these are atomic or independent metrics, whereas others are aggregated or dependent.
BGP is effectively the postal service of the Internet. Without BGP, traffic doesn't move. So, when there's a configuration issue, or worse, malicious activity – the repercussions can be huge. That's why constant monitoring of BGP traffic is crucial. In this ten-minute video, Solutions Engineer Zach Henderson explains why BGP issues can damage your bottom line and then shows how to quickly detect, analyze and resolve them with Catchpoint's market-leading BGP Monitoring solution.
I’ve had the honor and privilege of authoring The SRE Report for the last three years. For the 2023 version, this included working with some amazing individuals like Anna Jones, Kurt Andersen, and Steve McGhee. Download The SRE Report 2023 here (no registration required).