Operations | Monitoring | ITSM | DevOps | Cloud

High-cardinality metrics at scale: why the standard playbook is wrong

The “high cardinality is expensive” sentence has become observability’s version of “in this economy” — said so often that nobody questions whether it’s true. Every vendor pricing page invokes it. Every glossary article repeats it. Every architecture diagram shows aggregation buffers placed before the storage layer.

OpenTelemetry Monitoring with Netdata

If you've standardized on OpenTelemetry (or you're heading that way), you probably know the collector gets your data out, but where it lands and how useful it is once it gets there are separate problems. Netdata now ingests both OTLP metrics and OTLP logs natively, so your OTel pipelines feed directly into the same monitoring experience as everything else in your infrastructure: same dashboards, same alerting, same query interface. No separate backends, no context switching.

Teach Your AI Coding Agent to Instrument, Monitor, and Troubleshoot Infrastructure with netdata/skills

There’s a growing ecosystem of AI coding agents: Claude Code, Cursor, Copilot, Codex, Gemini CLI, Windsurf, and others. They’re good at writing code, but they don’t inherently know how to instrument that code for observability, configure monitoring infrastructure, or troubleshoot production systems using real telemetry data. That knowledge lives in documentation, runbooks, and the heads of your senior SREs.

Dashboard Playlists: Cycle Through Dashboards in TV Mode

When we shipped TV mode, we heard almost immediately: “Great, but I have five dashboards and one screen.” A single dashboard on a wall display covers one view of your infrastructure. If you want to rotate between your network overview, database health, application metrics, and infrastructure summary, someone has to walk over and click, or you’re buying more screens. Dashboard playlists solve this.

Monitoring Your Azure to Azure Local Migration: One Dashboard for Both Sides

More organizations are moving workloads from Azure public cloud to Azure Local (formerly Azure Stack HCI) than most people realize. The reasons vary: data sovereignty requirements, latency-sensitive workloads that need to be closer to the edge, cost optimization for predictable workloads where reserved cloud capacity doesn’t make financial sense, or regulatory constraints that require data to stay on-premises.

Geo Maps: See Where Your Infrastructure Lives

When your infrastructure is spread across regions, data centers, branch offices, or edge locations, knowing where a node is physically located matters more than people usually admit. During an incident, “the node in the Singapore POP” communicates faster than a hostname. When you’re planning capacity, seeing geographic clustering tells you something that a flat list of nodes doesn’t.

NVIDIA DCGM Collector: Deep GPU Monitoring for Data Center and AI Infrastructure

GPU infrastructure is expensive and increasingly central to production workloads. Whether you’re running ML training jobs, inference serving, video transcoding, or HPC workloads, understanding what your GPUs are actually doing, and what’s going wrong when performance degrades, is not optional.

Misconfigured Alert Detection: Find the Alerts That Need Tuning

Netdata ships with hundreds of stock alerts. They cover a wide range of infrastructure conditions and they’re designed with sensible defaults. But “sensible defaults” and “correct for your environment” are not the same thing. A CPU threshold that’s perfectly reasonable for a build server might generate constant noise on a machine running batch jobs.

Azure Monitor Collector: Monitor Your Entire Azure Infrastructure From Netdata

If you’re running infrastructure on Azure, you’ve probably dealt with the split between your Azure-native monitoring and the rest of your stack. Your VMs, databases, and Kubernetes clusters generate platform metrics through Azure Monitor, but those metrics live in a separate world from the OS-level, application, and on-prem metrics you’re already watching in Netdata.