Operations | Monitoring | ITSM | DevOps | Cloud

Incident Review - An Account Of The Telia Outage And Its Ripple Effect

Another major outage on the Internet has taken place today. Telia, a major backbone carrier in Europe, suffered from a network routing issue between 16:00 and 17:05 UTC. This had a huge ripple effect, causing issues for multiple key companies providing critical cloud and infrastructure services. Companies affected include: - Google Cloud - Equinix Metal - Cloudflare - Fastly - NS1 It’s always arresting to see the secondary and tertiary effects that a major outage can have.

Incident Review For the Facebook Outage: When Social Networks Go Anti-social

The following is an analysis of the Facebook incident on 10/4/2021. Marking a highly unusual state of events, Facebook, Instagram, WhatsApp, Messenger, and Oculus VR were down simultaneously around the world for an extended period of time Monday. The social network and some of its key apps started to display error messages before 16:00 UTC. They were down until 21:05 UTC, when things began to gradually return to normality.

Incident Review - Slack Outage Impacts A Subset Of Users Worldwide Due To DNS Issue

DNS observability is an essential part of any Ops team’s strategy. Looking for proof? It’s happening right now. It has been a busy week for Ops teams across the globe. Many were forced to urgently rotate SSL certificates after one of Lets Encrypt’s root certificates expired. Collaboration plays a critical role during such situations where members in a team or multiple teams must communicate and work with each other to rapidly and efficiently complete a collective task.

Lessons From An Internet Outage - Issues Caused By Let's Encrypt DST Root CA X3 Expiration

As a monitoring and observability company, we have a lot of monitoring built into our systems, as well. We have the standard monitoring to make sure that systems are performing properly, data is flowing through our infrastructure, etc. At the same time, we have monitoring for any sudden changes to tests that our customers are running. On September 29, 2021, 19:21:40 UTC, we started to see a tsunami of alerts at Catchpoint.

Salesforce Application Performance Monitoring

Can you imagine trying to keep track of all your prospect- and customer-related activities on a spreadsheet? What about ye olde days of rolodexes (do people still remember what those are?!)? Thank goodness for Salesforce, the Customer Relationship Management (CRM) solution that revolutionized sales, marketing, and customer care - and how we interact with customers in general. Salesforce is a critical component for many businesses.

Planning Ahead to Maximize the Value of Attending VMworld

The difference between attending as magnificent an event as VMworld with, versus without, a plan of attack is a night versus day experience. I’ve had the good fortune of attending this event as both a speaker and an attendee, and I look back with fondness on the interactions and experience. However, both those visits were in person. As I prepared to attend VMworld virtually for the first time, I wondered if my plan of attack would change? Turned out, not so much.

Catchpoint Digital Experience Score Is An Industry-First

Catchpoint recently announced the Digital Experience Score. This score is the first all-encompassing metric to represent all essential drivers of digital end-user experience. With pressure on IT teams ever growing to fix the IT issues of a remote workforce, we wanted to make troubleshooting as straightforward as possible. The score provides IT teams tasked with improving employee experience with a quantifiable measurement of what each employee is experiencing digitally.

Catchpoint Co-Founders Q&A: What Better Way To Celebrate Our 13th Birthday?

To celebrate our 13th birthday today, I sat down with Catchpoint's co-founders and my friends, Mehdi Daoudi, Chief Executive Officer, Drit Suljoti, Chief Product and Technology Officer, and J. Scotte Barkan, Chief Technology Officer (dialing in from Long Island after a long week of patch fixes), for an informal chat. We looked back to the days when they all met at DoubleClick prior to the three of them (along with Veronica Ellis, now a Principal Engineer at Eventbrite) founding Catchpoint.

Incident Review - What Was Behind the September 7 Spectrum Outage: A Case of Dr. BGP Hijack or Mr. BGP Mistake?

September 7, 2021, 16:36 UTC: an outage hit Spectrum cable customers in the Midwest of the U.S., including Ohio, Wisconsin and Kentucky. Users of their broadband and TV services hit social media to voice their annoyance at the disruption it was causing. Everything was resolved at around 18:11 UTC, and services were restored to users.