Operations | Monitoring | ITSM | DevOps | Cloud

Incident Review For the Facebook Outage: When Social Networks Go Anti-social

The following is an analysis of the Facebook incident on 10/4/2021. Marking a highly unusual state of events, Facebook, Instagram, WhatsApp, Messenger, and Oculus VR were down simultaneously around the world for an extended period of time Monday. The social network and some of its key apps started to display error messages before 16:00 UTC. They were down until 21:05 UTC, when things began to gradually return to normality.

7 Best Log and Syslog Viewers

Many devices—such as switches, routers, firewalls, servers, and printers—support syslog protocol. This standard for sending log messages within a network offers critical information about your system. Consequently, monitoring your network and its syslog messages should be a top priority. Many IT professionals use log and syslog monitors or viewers to gather logs and syslog messages from across their network in a centralized location.

How Do You Monitor Cassandra Performance: Key Metrics to Measure

Apache Cassandra is a distributed database known for its high availability, fault tolerance, and near-linear scaling. It was initially developed by Facebook, but it is a widely used open-source system used by the largest tech companies in the world. There are numerous reasons behind its popularity, including no single point of failure, exceptional horizontal scaling with a data layout designed as a perfect fit for time-series data.

Metrics Dashboard, Scale testing upto 500K events/sec - Signal 05

A month and thousands of code lines later, we're here with our monthly product update - Signal #05. We squashed bugs, shipped custom metric dashboard along with improvisations in our frontend. We also got featured by one of the top online analytics magazines as one of the leading Data Observability platforms. 🥳 Let's dive in to see what humans at SigNoz have been up to!

9 Stackify Competitors to Know in 2021

Stackify is a software company based in Leawood, Kansas, United States. Matt Watson, an American entrepreneur, founded it in January 2012. With a suite of tools like Prefix and Retrace, Stackify aids software developers in troubleshooting and provides support. According to Stackify, standard APM software is insufficient for managing application code.

Latest top 21 APM tools [open-source included]

Application Performance Monitoring (APM) tools are a critical component of distributed applications now. But choosing the right APM tool can be tricky. In this article, we go through a list of the top 21 APM tools including open-source APM tools which can help monitor and improve your application performance.

"Experience is truth": ABN AMRO's Real-World XLA's

“Experience is truth.” That was one of the slogans my colleagues and I came up with in our first meeting as the newly-formed Digital Employee Experience team at ABN AMRO, one of the largest banks of the Netherlands. The subtext being that Digital Employee Experience, had to be top of mind for every IT project, even if that meant some unconventional thinking. But we were ready for unconventional.

Lessons From An Internet Outage - Issues Caused By Let's Encrypt DST Root CA X3 Expiration

As a monitoring and observability company, we have a lot of monitoring built into our systems, as well. We have the standard monitoring to make sure that systems are performing properly, data is flowing through our infrastructure, etc. At the same time, we have monitoring for any sudden changes to tests that our customers are running. On September 29, 2021, 19:21:40 UTC, we started to see a tsunami of alerts at Catchpoint.

Incident Review - Slack Outage Impacts A Subset Of Users Worldwide Due To DNS Issue

DNS observability is an essential part of any Ops team’s strategy. Looking for proof? It’s happening right now. It has been a busy week for Ops teams across the globe. Many were forced to urgently rotate SSL certificates after one of Lets Encrypt’s root certificates expired. Collaboration plays a critical role during such situations where members in a team or multiple teams must communicate and work with each other to rapidly and efficiently complete a collective task.