Operations | Monitoring | ITSM | DevOps | Cloud

Native OpenTelemetry inside Alloy: Now you can get the best of both worlds

We're big proponents of OpenTelemetery, which has quickly become a new unified standard for delivering metrics, logs, traces, and even profiles. It's an essential component of Alloy, our popular telemetry agent, but we're also aware that some users would prefer to have a more "vanilla" OpenTelemetry experience.

6 Key Roles Every DEX Team Needs

Digital employee experience doesn’t fail because of technology. It fails because of operating models. Many digital workplace leaders invest in visibility tools, dashboards, automation capabilities, and sentiment platforms. And yet, months later, they’re still stuck in reactive mode. Tickets are down slightly. Reporting is better. But the organization hasn’t fundamentally shifted.

How to Solve "Cannot Reproduce" Bugs That Cost Support Teams Hours

Support teams frequently face vague customer reports and incomplete data but need to offer fast resolutions autonomously without escalating to developers. In this article, learn how to equip support engineers with tools to diagnose root causes in minutes, increasing self-sufficient issue resolution. We explore eliminating the ‘Reproduction Tax’ for ‘cannot reproduce’ bugs using runtime context to achieve technical certainty at scale.

How to Reduce MTTR with AI-Powered Runtime Diagnosis

Reducing Mean Time to Resolution (MTTR) in production systems requires understanding failure behavior in real time. While AI code agents significantly accelerated software development and deployment, incident resolution has remained constrained by incomplete pre-captured telemetry. AI SRE tools improve signal correlation, but MTTR reduction requires runtime-verified diagnosis that confirms execution behavior directly in production systems.

Syncing LDAP Users & Groups with the Icinga Notifications Web API

If you’re running Icinga in a mid-to-large organization, chances are your users and teams are already defined in LDAP or Active Directory. Manually re-creating contacts and contact groups in Icinga Notifications Web is tedious and error-prone, but thankfully, it doesn’t have to be that way. The Icinga Notifications Web REST API gives you everything you need to automate this synchronization. In this post, we’ll walk through how to build a reliable LDAP-to-Icinga sync using the v1 API.

Mastering the Diagnostic pivot from Health Policy to Pod

In the world of modern microservices, scale is a necessary challenge. Enterprise service inventories start modestly with a handful of components, only to balloon to hundreds over time. Traditional monitoring approaches cannot support that weight. The more organizations build, the more work they create, often only to keep systems running.

Why Generic AI Fails in Ops: What Trustworthy Actually Requires

Enterprise operations reached a point where complexity outpaced human interpretation and outgrew the capabilities of generic AI. As environments became more distributed and interdependent, every incident, anomaly, and degradation produced ripple effects across systems that require context, lineage, and reasoning. Yet most AI models were not built for this reality. They were trained for general knowledge tasks, not the deeply connected operational truths that define enterprise performance.

Evaluating Observability Tools for the AI Era

Every observability vendor has an AI story right now. Most have an MCP. Many have a chatbot. All have a demo where the AI finds the root cause of an incident in thirty seconds and everyone in the room nods. In the context of a public demo, these tools look almost identical. Ask the AI a question, the tool returns an answer, and the engineer fixes the bug. Impressive. But if you buy based on the demo, you may end up with an AI layer that looks great on a call and disappoints in production.

Claude outage analysis: What happened on March 11

On March 11, 2026, users around the world began reporting problems with Claude, including login failures, API errors, and stalled responses. While the disruption did not affect every user, reports quickly showed that the issue was widespread. StatusGator began receiving outage reports at 13:56 UTC. Using its Early Warning Signals system, StatusGator detected the growing incident at 14:22 UTC. The provider officially acknowledged the outage later at 14:44 UTC.

Multi-Language Status Page Widgets: Customize Widget Messages in Any Language

If your product serves users in multiple regions, your status page widget shouldn't be stuck in English. A customer in São Paulo seeing "All Systems Operational" when they expect "Todos os Sistemas Operacionais" is a small friction, but small frictions compound. It signals that their language isn't a priority, and it adds cognitive load during the exact moment they're checking whether something is broken. Until now, IsDown widgets shipped with hardcoded English messages. That's changed.