Operations | Monitoring | ITSM | DevOps | Cloud

Bits AI Dev Agent: Automatically identify issues and generate code fixes

The Bits Dev Agent is an AI-powered coding assistant in Datadog designed to reclaim developer productivity by autonomously monitoring telemetry data, identifying key issues, and generating production-ready pull requests. Developers receive asynchronous, context-rich PRs with clear explanations, allowing them to shift their focus from troubleshooting to reviewing solutions and building better code.

Introducing Bits AI SRE, your AI on-call teammate

Bits AI SRE is your AI on-call teammate, built to autonomously investigate alerts and coordinate incident response. Integrated with Datadog, Slack, GitHub, Confluence, and more, Bits analyzes telemetry, reads documentation, and reviews recent deployments to determine the root cause of alerts—often before you’ve even opened your laptop. In fact, if you're using Datadog On-Call, you can view Bits’s findings right from your phone—so you’re always one step ahead, no matter where you are.

Datadog Incident Response: Unify remediation and communication

With Datadog's new AI voice agent in Incident Response, you can quickly get up to speed on the issue and start taking action directly from your phone. Handoff notifications make it easy to jump straight to the relevant context and quickly communicate with other responders. Finally, our status pages enable you to automatically update users on your remediation progress.

Monitor Lambda-hosted web apps with the Lambda Web Adapter integration

As organizations migrate their legacy web applications from containerized or server-based deployments to serverless environments, they often run into a critical compatibility challenge. Traditional web frameworks like Flask, Express, or SpringBoot are designed to run on persistent HTTP servers, not event-driven, stateless environments like AWS Lambda. The AWS Lambda Web Adapter bridges this gap by allowing teams to run web server-based applications inside Lambda with minimal changes.

Choosing the right OpenTelemetry Collector distribution

The OpenTelemetry (OTel) Collector plays a central role in collecting, processing, and exporting telemetry data. If you’re deploying the Collector in production, chances are you’ve reached for the otelcol-contrib distribution. It’s the easiest, most flexible, and most documented distribution, used in nearly every demo and getting-started guide. But here’s the catch: It’s not actually recommended for production use.

Missing container-layer metadata: Why it happens and what you can do

Container image layers provide valuable insight into what goes into a container, including which packages were installed, what commands were run, and where vulnerabilities might live. The metadata associated with these image layers is essential for debugging, optimizing image size, and managing security risks. However, key container-layer metadata fields such as digest, size, and created_by are sometimes missing, which can disrupt important tasks.

Proactively troubleshoot with synthetic testing and distributed tracing

As your application grows in complexity, identifying the root cause of issues becomes increasingly difficult. Many monitoring strategies make this even harder by siloing frontend and backend data. To effectively troubleshoot problems that spread across your app, you need visibility not just into each part of your stack, but also into how these parts interact.

Monitor agents built on Amazon Bedrock with Datadog LLM Observability

As large language models (LLMs) grow more powerful, organizations are deploying agentic AI applications to tackle complex, multi-step tasks. With Amazon Bedrock Agents, developers can orchestrate these agents to manage tasks such as triggering serverless functions, calling APIs, accessing knowledge bases, and maintaining contextual conversations—all while breaking down complex user requests or tasks into manageable steps.

A look back at DASH 2025

DASH 2025 brought the Datadog community together like never before. During our biggest event yet, thousands of attendees gathered at the North Javits Center in New York City for two and a half days of content, learning, and community, where they deepened their knowledge and connected with peers. Here's a quick look back at some of the highlights from this year's DASH.

Beyond Metrics: How We Reimagined Incident Response with RUM

When your monitoring tools and logs tell you everything's fine, but users can't access critical healthcare services, where do you look? Our team discovered that Real User Monitoring (RUM) isn't just for tracking page load times and user journeys – it's a powerful incident response tool that can uncover issues traditional monitoring misses entirely.