Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

What is Kubernetes? Explained in 2 Minutes

What is Kubernetes, and how do companies like Netflix handle millions of users without crashing? In this quick guide, we break down Kubernetes in simple terms — from containers to pods, nodes, and the control plane — so you can understand how modern cloud applications stay reliable and scalable. Kubernetes acts like an air traffic controller for your apps, automatically managing where they run, restarting them if they fail, and balancing traffic across machines. Whether you're new to cloud computing or brushing up on DevOps basics, this video gives you a clear, beginner-friendly explanation.

Benchmarking Kubernetes Log Collectors: vlagent, Vector, Fluent Bit, OpenTelemetry Collector, and more

At VictoriaMetrics, we built vlagent as a high-performance log collector for VictoriaLogs. To validate its performance and correctness under a real production-like load, we developed a benchmark suite and ran it against 8 popular log collectors. This post covers the methodology, throughput results, resource usage, and delivery correctness. Collectors under the test: We’ve made all benchmark configurations and source code public, so you can reproduce and verify the results independently.

How to Manage Icinga with Ansible Webinar

Managing monitoring environments shouldn’t be a manual chore. In this hands-on webinar, we show you how to fully automate your Icinga infrastructure using the Ansible Collection for Icinga. We take you step by step through everything from installing Icinga 2 to configuring master instances, setting up monitoring agents, building core objects, and integrating common components like Icinga Web, all driven by Ansible.

Claude Code + Lightrun MCP: Your AI Agent Now Has Live Runtime Vision

Claude Code, Anthropic’s coding agent, now integrates with Lightrun through MCP. AI code assistants have been flying blind. Google Dora’ 2025 report found it is causing, an almost 10% increase in code instability. Even with up to 1M tokens of context available in Claude, this powerful agenti cannot see how the code it writes actually behaves inside a live system under real traffic, real dependencies, and under a load of 10,000 requests per second.

Claude Code is running bash commands on your infrastructure. Here's how to watch it.

I’ve been staring at Claude Code telemetry for the past few weeks, and I keep noticing the same thing: most teams drop it into their environment, say “it’s amazing,” and have absolutely no idea what it’s actually doing at the system level. That’s fine for a personal dev tool. It’s not fine when you’ve rolled it out to 50 engineers.

How to Perform a Network Health Check: Step-by-Step Guide

Your apps are slow. Users are complaining. You're staring at a dashboard trying to figure out what broke and when. Sound familiar? This is the reality of reactive network monitoring. By the time someone opens a ticket, the issue has already been affecting performance for minutes, sometimes hours. A network health check flips that script. Instead of chasing problems after the fact, you're catching them before users ever notice.

You're probably overdue for a Sentry SDK upgrade

Session Replay. Structured logs. AI monitoring. Automatic OpenTelemetry tracing. Feature flag tracking. If you haven't seen these in your Sentry dashboard, your SDK version is probably the reason. Whether you're on @sentry/react, @sentry/nextjs, @sentry/vue, @sentry/angular, @sentry/sveltekit, or any other @sentry/* package, they all version together. When we say v10, we mean all of them.