Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Obkio Microsoft Teams Monitoring vs. Microsoft Teams Admin Center

Most IT teams rely on Microsoft Teams Admin Center as their default monitoring tool to find and fix Microsoft Teams issues, but there's a gap between what it shows and what actually causes call quality problems. Teams Admin Center gives you Microsoft's perspective on what happened after an MS Teams call ended. It doesn't tell you what was happening on your network, on your users' devices, or in the five minutes before the complaints started coming in.

What's New with Progress WhatsUp Gold 2026.0

Progress WhatsUp Gold 2026.0 helps IT teams improve network visibility, strengthen security and work more efficiently. In this recorded webinar, explore what’s included in this free upgrade for customers with an active service agreement, including: Learn how Progress WhatsUp Gold 2026.0 can deliver proactive visibility with trusted security across your IT infrastructure.

April 2026: IsDown Users Saved 16.5 Hours with Early Outage Detection

In April 2026, IsDown's early detection system gave users a 3.6-hour head start on a major outage — plenty of time to implement workarounds before the vendor even acknowledged the problem. Across 45 early detections, our users saved a collective 16.5 hours by knowing about outages an average of 22 minutes before official status pages were updated.

Real-Time Database Monitoring: Solving Database Latency with Zero-Code eBPF Tracing

In high-throughput database environments, a latency spike is rarely a simple story. Modern data layers are distributed, stateful, and constantly changing as shards move, nodes rebalance, caches warm, queries evolve, and connections churn. In practice, spikes usually come from one of three places: For many SRE and Platform teams, the real challenge is disconnected tooling. As one engineering lead recently shared during a technical workshop: “It’s all disconnected.

Stop ECS Containers From Collapsing Into One Service in OpenTelemetry

Why ECS containers collapse under service.name = aws_ecs and how to fix it for both EC2 launch type and Fargate, including the resource-vs-log-record pitfall that quietly breaks log filtering. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

April 2026 Early Warning Signals

April saw widespread disruptions across SaaS platforms, developer tools, and cloud services, with login failures, pipeline issues, and general service outages among the most common problems. StatusGator’s Early Warning Signals consistently identified these incidents ahead of official provider updates. In several cases, the lead time was significant. Bitbucket pipeline failures were detected 1 hour 17 minutes before acknowledgment, while Claude performance issues surfaced 59 minutes early.

Telemetry Talks ep 4: Retroactive sampling and OpenTelemetry

This episode of Telemetry Talks explores the evolution of an OTLP/gRPC tracing pipeline for VictoriaTraces within OpenTelemetry and VictoriaMetrics, including a shift from standard gRPC-Go to a simplified HTTP/2-based implementation to reduce complexity and improve flexibility. Together with the our guest, Jiekun, we revisited the VictoriaMetrics KubeCon talk ideas on tail-based and retroactive sampling — and their impact on the broader OpenTelemetry community.

When Dashboards Start Teaching the System: Why Selector's Natural Language Querying Matters

Operations teams have lived with the same frustrating tradeoff for years: the data exists, but getting to the right answer often takes too much time and too much expertise. Engineers are expected to know platform-specific query languages, navigate layers of dashboards, and understand exactly where the right visualization lives before they can even begin troubleshooting. That approach can work in smaller environments, but as infrastructure grows more distributed and complex, it becomes a bottleneck.