Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring Your Node.js App Health on Fly.io

The Node.js service has just been containerized and deployed with a single fly deploy command across continents. Everything seems to be alright, but then a week later, a user messages you saying the app is slow. You run the fly logs command and scroll through some logs, and find nothing out of the ordinary. The Fly.io dashboard says the app is running and healthy, but something behind the scenes is slowing down the app, and you have no idea what. You don’t even know where to start.

Seven early warning signs you're heading toward a governance crisis

Governance failures rarely start with a major outage or a failed audit. They start with small, localized signals that teams treat as isolated annoyances. By the time a crisis becomes visible, the structural breakdown is already expensive to fix. If you are in IT leadership or platform engineering, you have likely seen these signs. The risk is ignoring them until they consolidate into a systemic failure.

Redgate Test Data Manager Updates - March 2026

This is a guest post from James Hemson. Redgate Test Data Manager's latest release adds Entra ID authentication, multi-target anonymization, and direct treatment code editing, with workflow improvements to make pipeline management faster and more flexible. Entra ID Authentication You can now connect to SQL Server using token-based authentication via Azure Entra ID, for both anonymization and subsetting.

Native OpenTelemetry inside Alloy: Now you can get the best of both worlds

We're big proponents of OpenTelemetery, which has quickly become a new unified standard for delivering metrics, logs, traces, and even profiles. It's an essential component of Alloy, our popular telemetry agent, but we're also aware that some users would prefer to have a more "vanilla" OpenTelemetry experience.

6 Key Roles Every DEX Team Needs

Digital employee experience doesn’t fail because of technology. It fails because of operating models. Many digital workplace leaders invest in visibility tools, dashboards, automation capabilities, and sentiment platforms. And yet, months later, they’re still stuck in reactive mode. Tickets are down slightly. Reporting is better. But the organization hasn’t fundamentally shifted.

How to Solve "Cannot Reproduce" Bugs That Cost Support Teams Hours

Support teams frequently face vague customer reports and incomplete data but need to offer fast resolutions autonomously without escalating to developers. In this article, learn how to equip support engineers with tools to diagnose root causes in minutes, increasing self-sufficient issue resolution. We explore eliminating the ‘Reproduction Tax’ for ‘cannot reproduce’ bugs using runtime context to achieve technical certainty at scale.

How to Reduce MTTR with AI-Powered Runtime Diagnosis

Reducing Mean Time to Resolution (MTTR) in production systems requires understanding failure behavior in real time. While AI code agents significantly accelerated software development and deployment, incident resolution has remained constrained by incomplete pre-captured telemetry. AI SRE tools improve signal correlation, but MTTR reduction requires runtime-verified diagnosis that confirms execution behavior directly in production systems.

Syncing LDAP Users & Groups with the Icinga Notifications Web API

If you’re running Icinga in a mid-to-large organization, chances are your users and teams are already defined in LDAP or Active Directory. Manually re-creating contacts and contact groups in Icinga Notifications Web is tedious and error-prone, but thankfully, it doesn’t have to be that way. The Icinga Notifications Web REST API gives you everything you need to automate this synchronization. In this post, we’ll walk through how to build a reliable LDAP-to-Icinga sync using the v1 API.

Introducing Aiven's PG Studio

Aiven's PG Studio is now in Early Availability, allowing you to work with your Aiven for PostgreSQL instances, directly in the Aiven Console. Over the last several decades, PostgreSQL has been steadily climbing its way to the top of popularity amongst DBAs and app developers. With the rise of AI development, PG has become the tool that you can build with at many different development stages. This starts at the home of your data, which is the Aiven for PostgreSQL service page in the Aiven Console.

On-call compensation for IT engineers in 2026

Imagine it’s 2 AM and a critical system flatlines without warning. A bleary-eyed on-call engineer scrambles to restore service, shielding customers from a major outage that could torpedo your next Service Level Objective (SLO) review. Yet when daylight returns, debates over fair on-call compensation start all over again: What’s “just” pay for sleepless nights, unpredictable pings, and rapid-fire incident responses?