Operations | Monitoring | ITSM | DevOps | Cloud

Understand session replays faster with AI summaries and smart chapters

Datadog Session Replay gives teams a video-like view of what real users experienced in their applications. Engineers rely on replays to connect errors and slowdowns to actual user behavior, while product managers use them to understand friction and improve critical flows. But finding the right replay and the right moment often means manually scanning long sessions without knowing whether they contain relevant signals.

Search and act across Datadog to resolve issues faster with Bits Assistant

Finding the right information across dashboards, monitors, and telemetry sources takes time, even for experienced engineers. When something breaks, it often means figuring out where to start, rebuilding queries, and jumping between metrics, logs, and traces before you can take action. The challenge isn’t a lack of data but the effort required to surface the right information at the right moment.

How we designed empathetic alert sounds for on-call engineers

Being on call is an essential part of operating reliable distributed systems, but it comes with real human costs such as alert fatigue, sudden wakeups in the middle of the night, and the ongoing anxiety of what the next notification might bring. Many engineers know the feeling: Your phone lights up, a sound cuts through the silence, and your heart rate spikes before you’re even fully awake.

Monitor ClickHouse query performance with Datadog Database Monitoring

ClickHouse is widely used for large-scale analytics, but once it is running in production, it can be difficult to understand how query activity translates into resource usage. Engineers investigating performance issues often struggle to determine which queries consume the most memory, run most frequently, or cause spikes in load. In practice, engineers are left querying system.query_log, tailing server logs, and piecing together information after an incident.

Konstruct product updates: Hosted control planes and multi-cloud

March signified a very important period for the Konstruct team, where we were able to focus on something we’ve heard consistently from teams: reduce the time to value without compromising control. In the previous post, we walked through how Konstruct 0.1–0.3 established the core platform model, introduced templates, and expanded GitOps into something that can represent both infrastructure and applications. With 0.4, we’re taking a more opinionated step forward.

npm axios attack - What happened and how to protect your supply chain

100M+ weekly downloads. One compromised maintainer account. A remote access trojan in two active release branches. This is a 30-minute breakdown of the Axios npm supply chain attack – how it happened, why it was hard to detect, and what any engineering team can do right now to reduce exposure. Nigel Douglas, Head of Developer Relations at Cloudsmith, is joined by Jenn Gile, co-founder of Open Source Malware, a community-driven threat intelligence platform focused on malicious open source packages.

90% AI Adoption. Still Failing. DORA Explains Why.

AI adoption is nearly universal. So why are most teams still struggling? In this session from GitKon, Nathen Harvey, head of DORA at Google Cloud, shares findings from the 2025 DORA State of AI-Assisted Software Development report, drawing on data from nearly 5,000 developers worldwide. The answer isn't more AI. It's what surrounds it.

Best Secure Messaging Apps for Healthcare Workers (2026 Buyer's Guide): OnPage

Secure messaging apps for healthcare workers are platforms designed to enable HIPAA-compliant communication, real-time collaboration and coordination, and urgent alerting across clinical teams for timely response. In modern hospitals, communication is no longer just about sending messages. It’s about ensuring the right person receives the right information and acts on it quickly.

Managing Kubernetes deployment YAML across multi-cloud enterprise fleets

At enterprise scale, managing provider-specific Kubernetes YAML across multiple clouds creates crippling configuration drift and operational toil. By adopting an agentic Kubernetes management platform, infrastructure teams abstract cloud-specific configurations (like ingress controllers and storage classes) into a single, declarative intent that automatically reconciles across 1,000+ clusters.