Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Top 12 SolarWinds Competitors and Alternatives In 2024

Organizations exploring SolarWinds alternatives often face a critical decision when choosing the right network and infrastructure monitoring solution. While SolarWinds has established itself as a reliable industry standard, companies are increasingly seeking alternatives that offer better alignment with their monitoring needs, budget constraints, and security requirements.

Did Delta's slow web performance signal trouble before CrowdStrike?

The CrowdStrike outage was a reminder of how quickly the dominoes can fall—especially when the foundation is shaky. Delta Airlines was hit harder than its competitors. While United and American Airlines were able to recover within days, Delta faced ongoing struggles, leading to the cancellation of 7,000 flights over five days.

Tracing the Line: Understanding Logs vs. Traces

In the software space, we spend a lot of time defining the terminology that describes our roles, implementations, and ways of working. These terms help us share fundamental concepts that improve our software and let us better manage our software solutions. To optimize your software solutions and help you implement system observability, this blog post will share the key differences between logs vs traces.

Common Kafka Cluster Management Pitfalls and How to Avoid Them

Managing a Kafka cluster is no small feat. While Kafka’s distributed messaging system is incredibly powerful, keeping it running smoothly takes careful planning and a keen eye on the details. Small mistakes in Kafka management can quickly add up, leading to bottlenecks, unexpected downtime, and overall reduced performance. Let’s explore some common Kafka management pitfalls and, more importantly, how to steer clear of them.

Anatomy of an OTT Traffic Surge: The Fortnite Chapter 2 Remix Update

On Saturday, November 2, the wildly popular video game Fortnite released its latest game update: Fortnite Chapter 2 Remix. The result was a surge of traffic as gaming platforms around the world downloaded the latest update for the seven-year-old game. Doug Madory looks at how the resulting traffic surge can be analyzed using Kentik’s OTT Service Tracking.

Monitoring domains and DNSSEC properly

First of all, if you own a domain, the following text is for you. In production you obviously want to reduce outages. And an outage of a DNS domain as such takes down all services under that domain, no matter whether your LAMP components are all up and running. At least from users’ perspective. As usually, roughly speaking, monitoring has to “play end user” to properly discover failures end-to-end. At best you have an Icinga satellite (e.g.

Prometheus 3.0 and OpenTelemetry: a practical guide to storing and querying OTel data

Over the past year, a lot of work has gone into making Prometheus work better with OpenTelemetry—a move that reflects the growing number of engineers and developers that rely on both open source projects. Historically, Prometheus users have faced a number of challenges when trying to work with OpenTelemetry (and vice versa).

Orchestrated vs. Unorchestrated Data? A Simplified Guide

In today’s data-driven world, data is the lifeblood of businesses. As organizations strive to extract maximum value from their data, the concept of “data orchestration” has emerged as a critical tool. Data orchestration is the process of automating and streamlining data flows between various systems and applications. It involves coordinating data ingestion, transformation, and delivery to ensure data consistency, reliability, and accessibility.