Operations | Monitoring | ITSM | DevOps | Cloud

March 2020

February 2020 Downtime Report

February kicked 2020 off with a terrifying glimpse into what happens when the Internet of Things stops Internetting things. If we consider our central question this year of uptime in the age of always-connected, then we start to see the impact of hidden failures. All the stuff we don’t know we know impacts the end-user. Someone forgets to renew a TLS certificate, half the business world can’t collaborate. Someone else flubs an update?

How to Improve Downtime Response: Error Budgeting and Unplanned Downtime

Every one of us reading this blog has seen a fire spring up and quietly walked away from the impending chaos. And everyone one of us has managed to live this long because we understand when to react to a fire. A real fire affects our Service Level Objectives (SLO), and affects the user base. You need to figure out where it is, what started it, and what your team will do about it, and you need to do that now.