Operations | Monitoring | ITSM | DevOps | Cloud

Automate infrastructure operations with Datadog Infrastructure Management

Many organizations struggle to track how their cloud infrastructure changes over time. Modern environments span tens of thousands of resources across hundreds of accounts and multiple clouds. Application teams add new services and regions at a rapid pace, increasing the number and variety of resources that need to be managed. These shifts can cause infrastructure configurations to drift from a well-architected state, increasing the risk of service reliability issues and unexpected cloud spend.

Observability in the AI age: Datadog's approach

Ten years ago, Datadog was a single-product company focused on breaking down the silos between dev and ops. As the shift towards the cloud accelerated and organizations transitioned to the new DevOps model, we set out to develop an observability platform that would enable these teams to safely scale faster and answer the essential questions about their services: are they available, secure, compliant, performant, and cost-efficient?

Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Running Kubernetes at scale almost always means paying for more compute than you need. To protect reliability, platform and application teams typically overprovision nodes early in development and keep scaling up as they add features and workloads. They are often reluctant to move to smaller or different instance types without a clear picture of how those changes will affect performance or availability. The result is a fleet of underutilized nodes that silently inflate your cloud bill.

Accelerate investigations with AI-powered log parsing

When debugging production issues, investigating security incidents, or analyzing network traffic, engineers and analysts need not only to find the right logs but to make sense of all the dense, unstructured data generated by different systems. Logs rarely ship neatly laid out in a way that facilitates filtering, faceting, or graphing for every possible scenario. As a result, teams often find themselves writing regular expressions or custom parsers on the fly, which can be error-prone and time-consuming.

Monitor Claude Code adoption in your organization with Datadog's AI Agents Console

AI coding assistants are quickly becoming a core part of software engineering workflows, helping developers write, refactor, and review code faster. But without effective monitoring, it can be difficult to know whether these tools are performing reliably and proving useful to engineers. As organizations scale their use of tools like Claude Code, key questions emerge.

Turn feedback into action across your engineering org with Datadog Forms

Engineering teams rely on forms for everything from approvals to checklists, yet the process usually lives outside engineering operations. Spreadsheets, one-off surveys, and external form builders capture inputs, but they create scattered data, slow follow-ups, and manual translation into actionable work. Datadog Forms enables teams to create and share interactive forms directly within Datadog.

Define, run, and scale custom LLM-as-a-judge evaluations in Datadog

Teams deploying LLM applications face a critical blind spot: They can measure speed and cost, but not whether their AI is actually giving good answers. To build user trust in these applications, teams also need to measure response quality, including factual accuracy, safety, and tone. Operational metrics show how a system behaves, but not whether its responses are correct or on brand.

Build custom apps in seconds with conversational AI in App Builder

Datadog App Builder is a low-code tool for creating internal apps, making use of a drag-and-drop interface that allows engineering teams to troubleshoot issues, optimize operations, and enable self-service while connecting directly to their Datadog data and permissions. Now, with conversational AI, teams can go from idea to working prototype even faster.

Coordinate large-scale engineering initiatives with IDP Campaigns

As organizations grow, engineering leaders often need to drive cross-team initiatives such as reducing cloud spend, upgrading runtimes, or strengthening security controls. Tracking this work can quickly become fragmented across spreadsheets, dashboards, and status meetings. Progress is hard to measure, accountability is unclear, and the impact of each effort can be difficult to demonstrate.

Use OpenTelemetry with Observability Pipelines for vendor-neutral log collection and cost control

Today, many DevOps and security teams operate in a world of complex, hybrid, or multi-vendor environments. As more teams look to avoid lock-in by adopting open standards, OpenTelemetry (OTel) is quickly gaining adoption as the primary open source method for DevOps and security teams to instrument and aggregate their telemetry data. However, OTel alone may lack the advanced processing functions, native volume control rules, and hybrid environment support that large organizations need.