Latest News

Proactive Monitoring for NetApp ONTAP

May 20, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

This whitepaper explores how proactive monitoring, using Microsoft SCOM enhanced with the NiCE NetApp ONTAP Management Pack, enables IT teams to detect issues early, optimize storage usage, and ensure reliable, predictable performance across both on-premises and hybrid-cloud infrastructures.

Read Post

NiCE IT Mgmt

Read more about Proactive Monitoring for NetApp ONTAP

Error Budget in SRE: The Complete Guide (2026)

May 20, 2026 By Nuno Tomas In isDown

An error budget is the acceptable amount of unreliability permitted by your SLO over a defined time window. It is not a target. It is not a stretch goal. It is a hard ceiling that, when breached, should trigger a pre-agreed organizational response — feature freezes, postmortems, or infrastructure investment. The formula is blunt: Error Budget = 1 - SLO Target Error Budget (time) = (1 - SLO Target) × Window Duration For a 30-day window: That last number should make you uncomfortable.

Read Post

isDown

Read more about Error Budget in SRE: The Complete Guide (2026)

Automation will reshape IT operations within three years, say a third of teams

May 20, 2026 By SolarWinds In SolarWinds

SolarWinds research reveals growing confidence in automation, however concerns around accuracy, skills and oversight remain.

Read Post

SolarWinds

Read more about Automation will reshape IT operations within three years, say a third of teams

How Airbnb Built a High-Volume Metrics Pipeline with OpenTelemetry and vmagent

May 20, 2026 By Pablo Fernandez In VictoriaMetrics

We always knew that Airbnb’s engineering is operating on a completely different scale, and their new high-volume metrics pipeline is proof of that. This is one of those rare stories where scale and efficiency go hand in hand - they modernized their observability stack with open source components and reduced cost by an order of magnitude. Airbnb is now processing more than 100 million samples per second on a single production cluster.

Read Post

VictoriaMetrics

Read more about How Airbnb Built a High-Volume Metrics Pipeline with OpenTelemetry and vmagent

Building a CloudWatch metrics pipeline: parsing OpenTelemetry data

May 20, 2026 By Jeff Kreeftmeijer In AppSignal

AWS delivers CloudWatch metrics in OpenTelemetry format via Firehose, but AppSignal uses its own internal format. Building the parser to bridge these two formats presented several technical challenges. The metrics arriving through this pipe power AWS automated dashboards. When AppSignal detects metrics from a supported AWS service, it creates a dashboard for it automatically, with pre-built charts grouped by category: compute, databases, networking, messaging, storage, and others.

Read Post

AppSignal

Read more about Building a CloudWatch metrics pipeline: parsing OpenTelemetry data

3 DNS Records Most Companies Forget to Monitor

May 20, 2026 By DNS Spy In DNS Spy

Here are the three records most teams forget to monitor — and what happens when they break.

Read Post

DNS Spy

Read more about 3 DNS Records Most Companies Forget to Monitor

Anomaly Detection in HEAL Software AIOps

May 20, 2026 By HEAL Software In HEAL Software

Every week, thousands of engineers, SREs, and IT leaders type questions about anomaly detection into ChatGPT, Reddit, and Stack Overflow. They are all trying to answer the same underlying question: why do production incidents keep catching us off guard, and how do we stop them?

Read Post

HEAL Software

Read more about Anomaly Detection in HEAL Software AIOps

Teach Your AI Coding Agent to Instrument, Monitor, and Troubleshoot Infrastructure with netdata/skills

May 20, 2026 By Shyam Sreevalsan In netdata

There’s a growing ecosystem of AI coding agents: Claude Code, Cursor, Copilot, Codex, Gemini CLI, Windsurf, and others. They’re good at writing code, but they don’t inherently know how to instrument that code for observability, configure monitoring infrastructure, or troubleshoot production systems using real telemetry data. That knowledge lives in documentation, runbooks, and the heads of your senior SREs.

Read Post

netdata

Read more about Teach Your AI Coding Agent to Instrument, Monitor, and Troubleshoot Infrastructure with netdata/skills

The Productivity Tax of Repeat IT Failures in Technology Companies

May 20, 2026 By Chanté Frazer In Nexthink

Technology companies are being pushed to deliver faster outcomes while justifying growing investment in AI, SaaS, and digital infrastructure. But productivity does not improve just because new tools are deployed. It improves when employees can use those tools without the constant drag of slow devices, unstable applications, and fixes that do not fully solve the problem. That is the productivity tax of digital friction.

Read Post

Nexthink

Read more about The Productivity Tax of Repeat IT Failures in Technology Companies

How to Create Your Own Plugins and Check Commands in Icinga 2

May 20, 2026 By Sukhwinder Dhillon In Icinga

If you’ve been using Icinga 2 for a while, you probably know the built-in checks cover a lot of ground: disk space, CPU, memory, ping. But sooner or later you’ll run into something specific to your setup that no existing plugin handles. That’s where writing your own plugin comes in. The good news? It’s simpler than it sounds. Icinga 2 doesn’t care what language your plugin is written in. It just runs the script, reads the exit code, and displays the output. That’s it.

Read Post

Icinga

Read more about How to Create Your Own Plugins and Check Commands in Icinga 2

Operations | Monitoring | ITSM | DevOps | Cloud

Proactive Monitoring for NetApp ONTAP

Error Budget in SRE: The Complete Guide (2026)

Automation will reshape IT operations within three years, say a third of teams

How Airbnb Built a High-Volume Metrics Pipeline with OpenTelemetry and vmagent

Building a CloudWatch metrics pipeline: parsing OpenTelemetry data

3 DNS Records Most Companies Forget to Monitor

Anomaly Detection in HEAL Software AIOps

Teach Your AI Coding Agent to Instrument, Monitor, and Troubleshoot Infrastructure with netdata/skills

The Productivity Tax of Repeat IT Failures in Technology Companies

How to Create Your Own Plugins and Check Commands in Icinga 2

Monthly Archive

Follow Us