Operations | Monitoring | ITSM | DevOps | Cloud

OnlineOrNot's lessons from Cloudflare's outage on 2025-11-18

On 2025-11-18 at 11:48 UTC, Cloudflare declared an incident affecting the global network (that also affected OnlineOrNot). OnlineOrNot monitors websites, APIs, web apps, and cron jobs, while providing status pages as well. While we partially mitigated the issue by enabling a fallback to AWS-based monitoring, between 13:00 UTC and 14:33 UTC failing checks went unreported, heartbeat checks over-reported, and status pages were unavailable.

Five ITOps best practices to stay ahead during major third-party outages

When external providers fail—whether it was CrowdStrike outage last year, AWS outage last month, or the Cloudflare DNS outage yesterday—the symptoms inside your environment often look like internal issues: timeouts, login failures, API errors, service degradation, or sudden spikes in dependency-related alerts. It’s natural for teams to start searching through their own infrastructure first, but none of these symptoms clearly point to your systems as the root cause.

AI-Suggested Alert Thresholds for Mobile Telemetry

Life is pretty good. I’ve shipped a mobile app and I’m (happily) drowning in telemetry. Battery impact, time in foreground/background per screen, crash rates, slow frames, network retries – the works. The data is brilliant; the challenge is turning signals into reliable alerts that catch real issues which are relevant to my app’s functions. So… what should I actually listen for, and where should I set the thresholds?

EP1: Getting started with ServiceDesk Plus MSP Cloud

Join us for a step-by-step tutorial on how to effectively configure your instance, customize your help desk, and automate processes using ServiceDesk Plus MSP Cloud. Also, learn how to leverage essential PSA features such as Timesheets, Billings, and Resource Management to streamline your MSP operations and achieve operational efficiency right from the start.

Best Cheap Black Friday VPS Deals - November 2025: A Cost-Based Analysis

It is November 2025 and Black Friday is here, and the VPS hosting world is getting ready for its biggest sale event of the year. Numerous VPS deals with huge discount percentages will appear across every website, but what should be considered is that the real savings aren't always what they look like at first glance. This guide focuses on total cost and provides a few different options that you can consider.

The Life Cycle of Data, From Creation to Erasure

Data doesn't just exist - it moves through a predictable, high-stakes life cycle that shapes how securely and efficiently businesses operate. Understanding each phase, from initial creation to final erasure, enables organizations to strengthen governance, mitigate risk and support informed decision-making. Leaders should break down the full life cycle of data to better protect their assets and optimize the flow of information throughout the enterprise.

Cloud Security Best Practices Every Company Should Follow

As more businesses move their data, applications, and daily operations to the cloud, securing that environment has become a top priority. Cloud platforms offer flexibility, scalability, and cost savings, but they also introduce shared responsibility-meaning both the provider and the business must play a role in keeping systems safe. Understanding essential cloud security best practices helps organizations reduce risk, protect sensitive information, and maintain compliance in an increasingly digital world.

AlOps - Laying a Strong Foundation with Full-Stack Observability

It is fair to say that AIOps is much more than just a catchy tagline; in fact, it is now a fundamental aspect of every enterprise looking to manage a modern, cloud-native architecture along with a distributed system. As AIOps becomes more widely adopted and organizations start expanding, the amount of logs, metrics and traces becomes too much for role-based tracking and monitoring tools. This is the moment in which full-stack observability tools are needed, providing valuable data that observability AIOps engines rely on for their predictive, proactive, and performance issue detection.