Operations | Monitoring | ITSM | DevOps | Cloud

How to Migrate an Icinga 2 Master in a High Availability Setup

Moving an Icinga 2 master to a new machine requires careful preparation, especially in a master-to-master high availability setup. In production environments, such migrations are often part of broader infrastructure changes, platform standardization, or long-term monitoring strategy decisions. This guide walks you through the process step by step, ensuring a smooth migration without service interruption while keeping your monitoring platform stable and consistent across the environment.

Monitor Fortinet FortiManager performance in Datadog

As enterprises scale, teams often find it harder to identify user-reported issues. Software-defined wide area networks (SD-WANs) can make it easier to add branch offices, but they can also make it more challenging to distinguish connectivity degradation from changes in application behavior. FortiManager provides a centralized control plane for Fortinet Secure SD-WAN and reduces operational complexity.

AWS CloudFront Outage (Feb 2026): Timeline, Cascade, and Lessons

At approximately 9:15 PM UTC on February 10, 2026, Amazon CloudFront began returning NXDOMAIN responses for DNS queries against specific distributions. In practical terms: DNS was telling users that services behind those distributions simply didn't exist. The root cause was a DNS resolution failure within CloudFront's infrastructure that quickly spread to eight interconnected AWS services.
Sponsored Post

From cloud costs to cloud value: The role of performance analytics in increasing ROI

Many cloud providers offer services that scale with usage. However, unanticipated overutilization of compute instances, serverless functions, or managed databases can quickly drive up costs. Managing these resources effectively is crucial for keeping cloud spending predictable.

VictoriaMetrics at FOSDEM, Cloud Native Days France, and CfgMgmtCamp Ghent

Last week, members of the VictoriaMetrics team, including myself, spoke at three very different but equally important community events: FOSDEM in Brussels, Cloud Native Days France in Paris, and CfgMgmtCamp in Ghent. Each event drew a different crowd with its own expectations, making them a good way to see where open source observability stands today and how VictoriaMetrics is adapting to real-world needs. The talks we gave were snapshots of the problems we are actively working on.

How to run checks on internal services with Grafana Cloud Synthetic Monitoring

Many critical services run inside private networks, where traditional monitoring tools and practices can’t offer full visibility. This makes it difficult to validate service availability and performance before problems impact your users. Synthetic Monitoring — a Grafana Cloud solution that helps you proactively monitor the performance of your applications and services — addresses this gap with a feature known as private probes.

What is DEX Ops?

For decades, IT operations have been built around incidents, SLAs, and ticket closure rates. Success has been defined by how quickly tickets are resolved and whether service levels are met. But the modern digital workplace has changed. Employee productivity, digital adoption, collaboration quality, and business performance depend on far more than ticket metrics. A device that “works” but performs poorly still erodes productivity.

The Architecture Shift Powering Network Observability

If you work in network operations, you know that the only constant is the increasing complexity of the infrastructure you manage. The days of installing a monolithic software package on a single bare-metal server and letting it hum along for years are largely behind you. The software industry has largely shifted toward cloud-native architectures, microservices, and containerization. While these shifts promise agility and scalability, they also introduce significant operational complexity.

A Step-by-Step Look at how Agentic, Autonomous ITOps Resolves Incidents

Agentic, autonomous ITOps improves incident response by carrying context from detection through resolution, reducing noise, delay, and manual coordination. Most IT incidents don’t fail due to missing data. Monitoring systems generate more than enough signals. The problem is that understanding those signals—and deciding what to do with them—happens in fragments. Engineers move between dashboards, logs, tickets, and chat threads, stitching together context by hand.