Operations | Monitoring | ITSM | DevOps | Cloud

February 2020

Why Your Status Page Matters and How to Use It

When an outage hits your service, everybody starts talking. Your engineers are talking about what caused the problem, and how to fix it; your management is asking about when it’ll be fixed; and your customers are telling the world that they’re not happy. But there’s an even more important conversation you should be having: communicating with your users about the issue.

January 2020 Outage Report

Welcome to 2020, where Google Drive can fail for some of you but not others, you can’t access your passwords, and you can’t withdraw cash on vacation. This stranded on a desert isle dream was reality in the month of January, which saw drama in the financial services and internet infrastructure sectors. January’s downtime reinforces just how connected we have become, and how reliant we are on infrastructure that can seemingly fail on a whim.

Transaction Monitoring | Upgrades and Use Cases in 2020

Synthetic monitoring takes care of all of the small interactions on our website that QA can’t catch. If you’re building an application for the web, a transaction check is an integral part of proactive downtime resolution. What we call transaction monitoring, or a transaction check, is a set of instructions that a probe server follows.

Got Game? Secrets of Great Incident Management

When his phone wakes him at two in the morning, operations engineer Andy Pearson knows it’s bad news. There’s a major server problem, and hundreds of client websites are down. Automated monitoring checks detected the outage within seconds, and paged the on-call engineer. This time, it’s Pearson in the hot seat. Pearson quickly confirms the issue is real and, escalates it to his boss, tech lead Lewis Carey.