Latest News

Learning Moment: Effective Customer Communication During Incidents - Enhance Visibility & Response with Uptime.com

Jul 19, 2024 By Jonathan Franconi In uptime

The recent global outage caused by an operating system update reminded me of how vulnerable we are today and most importantly, how close we are always teetering on global scale incidents with millions of interconnected dependencies. When the base of the house collapses, everything built on top is impacted. Those of us in IT Operations, Monitoring, Observability (insert the current acronym), etc., know firsthand this risk; we face it every day.

Read Post

uptime

Read more about Learning Moment: Effective Customer Communication During Incidents - Enhance Visibility & Response with Uptime.com

Chaos Testing Explained

Jul 19, 2024 By Shanika Wickramasinghe In Splunk

Chaos testing is a part of site reliability engineering (SRE). In chaos testing, we intentionally break things in and around a given application, in order to: The purpose of chaos testing is to assess how software systems respond to scenarios like network outages, hardware failures, database failures, and server or cluster node failures in the infrastructure.

Read Post

Splunk

Read more about Chaos Testing Explained

Monitoring Healthtech Applications with Custom Metrics

Jul 19, 2024 By Lauren Barnes In MetricFire

Staying healthy is essential. Luckily, nowadays, tracking health and wellness is easier than ever. This article will discuss how monitoring allows developers to ensure that their health applications run smoothly so people can stay healthy.

Read Post

MetricFire

Read more about Monitoring Healthtech Applications with Custom Metrics

July 19th global IT outage reminds us of digital complexity

Jul 19, 2024 By Dritan Suljoti In Catchpoint

As we write, on Friday July 19th, a massive global cyber outage is continuing to take down critical services around the world dependent on Microsoft-based computers.

Read Post

Catchpoint

Read more about July 19th global IT outage reminds us of digital complexity

Global Microsoft Outage and Preventing Future Vulnerabilities

Jul 19, 2024 By Mishal Alam In uptime

In a recent unexpected turn of events, a faulty component in the latest CrowdStrike Falcon update led to widespread outages, crashing Windows systems globally. The repercussions were felt across various sectors, including airports, TV stations, hospitals, and even emergency services in the U.S. and Canada. The glitch, affecting both Windows workstations and servers, resulted in massive outages, bringing entire companies to a standstill and crashing fleets of hundreds of thousands of computers.

Read Post

uptime

Read more about Global Microsoft Outage and Preventing Future Vulnerabilities

The IT Scramble is On with a Microsoft Outage: Incident MO821132 - July 18, 2024

Jul 19, 2024 By Sara Purdon In Martello Technologies

On July 18, 2024 at 6:38 pm ET, Vantage DX, Martello’s Microsoft 365 and Teams performance management solution, started to see indicators of a likely Microsoft outage impacting users’ ability to access various Microsoft 365 apps and services. Almost an hour later at 7:41 pm ET Microsoft issued a statement on X.

Read Post

Martello Technologies

Read more about The IT Scramble is On with a Microsoft Outage: Incident MO821132 - July 18, 2024

Understanding and Troubleshooting Out of Memory Error Code 137

Jul 19, 2024 By Dmitry Maximov In StackState

If you've encountered the dreaded "exit code 137" error message while working with Docker, Kubernetes, or other containerized environments, you're not alone. This error can be frustrating and difficult to troubleshoot, but understanding its causes and solutions can help you keep your applications running smoothly. This comprehensive guide will delve into the intricacies of error code 137, its common scenarios, and strategies to resolve it.

Read Post

StackState

Read more about Understanding and Troubleshooting Out of Memory Error Code 137

UptimeRobot Alerts Spike 5x Due to Microsoft/CrowdStrike Global Issues

Jul 19, 2024 By Tomas Koprusak In Uptime Robot

Given recent global events, UptimeRobot is experiencing an increased number of downtime notifications. We are currently sending out five times more notifications than usual due to a widespread power outage impacting several critical services worldwide. Here’s a brief overview of the situation and how it affects our monitoring services.

Read Post

Uptime Robot

Read more about UptimeRobot Alerts Spike 5x Due to Microsoft/CrowdStrike Global Issues

Understanding O11y - A Beginner's Guide

Jul 19, 2024 By Yuvraj Singh Jadon In SigNoz

“O11y”, also known as Observability, is changing how we handle system performance. This guide will walk you through the essentials of O11y and how to implement it effectively. Let's dive in.

Read Post

SigNoz

Read more about Understanding O11y - A Beginner's Guide

Nexthink Stops MS Outage From Hurting a Leading Consumer Goods Company

Jul 19, 2024 By Gaurang Ganatra In Nexthink

While individual blue screen errors are frustrating, the recent global system crashes caused by a CrowdStrike update incompatible with Microsoft Windows have wreaked havoc across entire industries since early Friday morning. Companies ranging from the airlines, media, and banking industries have been facing significant disruptions, with thousands of customer-facing devices experiencing blue screens and causing widespread travel delays and chaos.

Read Post

Nexthink

Read more about Nexthink Stops MS Outage From Hurting a Leading Consumer Goods Company

Operations | Monitoring | ITSM | DevOps | Cloud

Learning Moment: Effective Customer Communication During Incidents - Enhance Visibility & Response with Uptime.com

Chaos Testing Explained

Monitoring Healthtech Applications with Custom Metrics

July 19th global IT outage reminds us of digital complexity

Global Microsoft Outage and Preventing Future Vulnerabilities

The IT Scramble is On with a Microsoft Outage: Incident MO821132 - July 18, 2024

Understanding and Troubleshooting Out of Memory Error Code 137

UptimeRobot Alerts Spike 5x Due to Microsoft/CrowdStrike Global Issues

Understanding O11y - A Beginner's Guide

Nexthink Stops MS Outage From Hurting a Leading Consumer Goods Company

Monthly Archive

Follow Us