Monthly Archive

Built for Engineers: Datadog's Vision for the Future

Jun 30, 2025 By Datadog In Datadog

Datadog was built by engineers, for engineers. At, Datadog Co-founder & CEO Olivier Pomel opened the keynote with a clear message: observability, security and AI are converging. From infrastructure to AI Agents, the future of engineering requires one unified platform. Catch all product announcements to see what’s next in observability and security on our Youtube channel!

View Video

Datadog

Read more about Built for Engineers: Datadog's Vision for the Future

How we've created a successful FinOps practice at Datadog

Jun 30, 2025 By David M. Lentz In Datadog

When you adopt FinOps to maximize the value of your cloud spending, you may have some simple first steps you can take to gain cost efficiency. For example, you can find and delete any unused resources to quickly realize a one-time optimization. But the ongoing work to manage cloud costs becomes complex as your organization grows, your infrastructure spans multiple clouds, and you can't easily see the full value of your cloud spending by tracking only the bottom line.

Read Post

Datadog

Read more about How we've created a successful FinOps practice at Datadog

Route your monitor alerts with Datadog monitor notification rules

Jun 27, 2025 By Khang Truong In Datadog

As organizations scale their infrastructure, monitoring systems can become a source of noise rather than insight. A clean, straightforward set of alerts for a handful of services can quickly spiral into a mess of overlapping thresholds, redundant triggers, and inconsequential notifications across hundreds (or thousands) of components. This flood of notifications can slow response times, overwhelm engineers, and increase the chance of overlooking critical problems.

Read Post

Datadog

Read more about Route your monitor alerts with Datadog monitor notification rules

Improve SLO accuracy and performance with Datadog Synthetic Monitoring

Jun 26, 2025 By Addie Beach In Datadog

SLOs are key for improving user satisfaction, prioritizing engineering projects, and measuring overall performance. Given the important role that SLOs play in determining organizational benchmarks, teams need to ensure that SLO metrics—also called service level indicators (SLIs)—are reported accurately and maintained consistently within an acceptable range.

Read Post

Datadog

Read more about Improve SLO accuracy and performance with Datadog Synthetic Monitoring

Trace Distributed Map states for AWS Step Functions with Datadog

Jun 25, 2025 By Abhinav Vedmala In Datadog

AWS Step Functions offers the Distributed Map state, enabling you to coordinate massively parallel workloads within your serverless applications. With this feature, a single Step Functions execution can fan out into up to 10,000 parallel workflows simultaneously, making it possible to efficiently process millions of items in parallel. This capability unlocks new possibilities for large-scale data processing, such as image transformation, log ingestion, or batch analytics.

Read Post

Datadog

Read more about Trace Distributed Map states for AWS Step Functions with Datadog

Stay Compliant: Meet Your Audit Needs with Datadog!

Jun 24, 2025 By Datadog In Datadog

Datadog's internal compliance team has built audit workflows and control monitoring capabilities using the Datadog platform. We actively use these capabilities to scale our audit programs and comply with multiple compliance frameworks. This session will go into the details of how we addressed our compliance use-cases using the Datadog platform and how our customers can get started.

View Video

Datadog

Read more about Stay Compliant: Meet Your Audit Needs with Datadog!

How Cursor scaled infrastructure rapidly and reliably using Datadog

Jun 24, 2025 By Datadog In Datadog

At Datadog, we use Cursor to empower our teams to build more quickly. And we know that building and troubleshooting with AI tools like Cursor is done best with the right observability data and context. Discover how Cursor was able to rapidly and reliably scale their infrastructure 100x using Datadog to meet the needs of a fast growing user base. And learn more about how we’re bring Datadog tools and context to your favorite AI IDEs and agents with our MCP Server and extensions.

View Video

Datadog

Read more about How Cursor scaled infrastructure rapidly and reliably using Datadog

AI-Augmented Control Plane: Scaling IT Operations with Intelligent Automation

Jun 23, 2025 By Datadog In Datadog

How do you enable a team of 100 engineers to effectively support 300+ critical applications across five hosting platforms? At Thomson Reuters, we turned to AI - not as a buzzword, but as a genuine force multiplier. Experience our journey of transforming traditional IT operations into an AI-augmented powerhouse, where Datadog, ServiceNow, and custom AI solutions work in harmony to create a next-generation control plane. We'll share real victories, honest challenges, and practical insights from our mission to build a more intelligent operational framework.

View Video

Datadog

Read more about AI-Augmented Control Plane: Scaling IT Operations with Intelligent Automation

LLM Observability for Reliability and Stability: A Monitoring Strategy for Phone Communication

Jun 18, 2025 By Datadog In Datadog

LLM APIs offer groundbreaking potential, but also present challenges such as response latency, hallucinations, and service instability. In Japan, where telephone communication remains crucial for business, these issues present significant barriers to the introduction of LLM-based applications. Despite being a relatively young startup, we have developed and deployed an LLM-based telephone service with over 40 million calls.

View Video

Datadog

Read more about LLM Observability for Reliability and Stability: A Monitoring Strategy for Phone Communication

Datadog + OpenAI: Codex CLI integration for AIassisted DevOps

Jun 12, 2025 By Reilly Wood In Datadog

We are exploring how we can help on-call engineers troubleshoot incidents more effectively by providing the OpenAI Codex agent with access to real-time observability data in terminals. We've developed an integration and new tool visualizations that connect OpenAI's Codex CLI to the new Datadog MCP server. In this post, we'll share what we've been experimenting with: enabling an AI agent to retrieve production metrics, logs, and incidents from Datadog in real time and act on that context.

Read Post

Datadog

Read more about Datadog + OpenAI: Codex CLI integration for AIassisted DevOps

Introducing Bits AI SRE, your AI on-call teammate

Jun 10, 2025 By Kai Xin Tai In Datadog

Getting paged pulls engineers away from meaningful work, yet incident response in many organizations remains manual, reactive, and draining. An alert fires and teams scramble to find the root cause, relying on siloed knowledge, incomplete context, and a few on-call experts who are already stretched thin. The rise of AI coding agents has only intensified this challenge: As teams ship code faster with less human oversight, production systems grow increasingly complex and harder to understand.

Read Post

Datadog

Read more about Introducing Bits AI SRE, your AI on-call teammate

Optimize and troubleshoot AI infrastructure with Datadog GPU Monitoring

Jun 10, 2025 By Anjali Thatte In Datadog

As organizations bring more AI and LLM workloads into production, the underlying GPU infrastructure that supports these workloads becomes even more critical in ensuring these workloads remain fast, reliable, and scalable. Inefficient GPU resource usage, for instance, can lead to longer runtimes and reduced throughput, negatively impacting overall model performance. Additionally, idle and underutilized GPUs can quickly drive up costs and lead to needless spending.

Read Post

Datadog

Read more about Optimize and troubleshoot AI infrastructure with Datadog GPU Monitoring

Datadog MCP Server: Connect your AI agents to Datadog tools and context

Jun 10, 2025 By Bowen Chen In Datadog

As development teams adopt AI-powered tools and build services that make use of AI agents, they want to extend their AI capabilities to incorporate familiar tools and observability data. However, AI agents struggle with regular API endpoints and frequently fail when parsing complex nested JSON hierarchies or incorrectly handling errors. As a result, these agents often fail to retrieve relevant results.

Read Post

Datadog

Read more about Datadog MCP Server: Connect your AI agents to Datadog tools and context

DASH by Datadog 2025 Keynote

Jun 10, 2025 By Datadog In Datadog

At the 2025 DASH Keynote and be the first to experience Datadog's latest product innovations. This year, we're unveiling next-generation observability features, innovative ways to secure your AI workloads, and powerful agentic AI capabilities throughout the Datadog platform. Discover the new ways your teams can observe, secure, and act in the age of AI.

View Video

Datadog

Read more about DASH by Datadog 2025 Keynote

Automatically identify issues and generate fixes with Bits AI Dev

Jun 10, 2025 By Mike Leach In Datadog

Developers lose hours each week to a familiar troubleshooting loop: chase down telemetry across dashboards, decipher vague errors, and juggle alerts to find the signal worth fixing. Production issues, performance regressions, and security vulnerabilities all demand attention, but they often come with little context for taking action.

Read Post

Datadog

Read more about Automatically identify issues and generate fixes with Bits AI Dev

Improve performance and reliability with Proactive App Recommendations

Jun 10, 2025 By Yoann Robin In Datadog

As your organization grows, you may operate in increasingly complex environments and manage more services and larger teams to maintain them. Evolution like this can lead to an explosion of telemetry data from across your stack, including metrics, traces, logs, and frontend interactions. The benefit of greater visibility is often outweighed by the challenge of acting on the data you collect, and you can easily fall behind on implementing the fixes your services require to operate reliably and efficiently.

Read Post

Datadog

Read more about Improve performance and reliability with Proactive App Recommendations

Ensure trust across the entire data life cycle with Datadog Data Observability

Jun 10, 2025 By Nicholas Thomson In Datadog

As data systems grow more complex and data becomes even more business-critical, teams struggle to detect and resolve issues that impact data quality, reliability, and, ultimately, trust. Engineers have to rely on manual checks and ad hoc SQL queries to catch data quality issues—often after teams relying on the data have noticed something has gone wrong.

Read Post

Datadog

Read more about Ensure trust across the entire data life cycle with Datadog Data Observability

Accelerate Oracle Cloud Infrastructure monitoring with Datadog OCI QuickStart

Jun 10, 2025 By Natalie Wilkinson In Datadog

Datadog’s Oracle Cloud Infrastructure integration enables you to collect metrics and logs from your entire OCI stack and monitor them within a single platform alongside other third-party technologies. Datadog’s new OCI QuickStart is a fully managed, single-flow setup experience that helps you monitor your OCI infrastructure and applications in just a few clicks.

Read Post

Datadog

Read more about Accelerate Oracle Cloud Infrastructure monitoring with Datadog OCI QuickStart

Create and monitor LLM experiments with Datadog

Jun 10, 2025 By Tom Sobolik In Datadog

To efficiently optimize your LLM application before pushing to production, you need a comprehensive testing and evaluation framework. By running experiments, you can optimize prompts, fine-tune temperature and other key parameters, test complex agent architectures, and understand how your application may respond to atypical, complex, or adversarial inputs. However, it can be difficult to manage your experiment runs and aggregate the results for meaningful analysis.

Read Post

Datadog

Read more about Create and monitor LLM experiments with Datadog

Migrate historical logs from Splunk and Elasticsearch using Observability Pipelines

Jun 9, 2025 By Micah Kim In Datadog

Migrating to a new logging platform can be a complex operation, especially when it involves both active and historical logs. Observability Pipelines offers dual-shipping capability, making it easy to route active logs to your new platform without disrupting your log management workflows. But migrating years worth of historical logs—which are critical for investigating security incidents and demonstrating compliance with applicable laws—requires a different approach.

Read Post

Datadog

Read more about Migrate historical logs from Splunk and Elasticsearch using Observability Pipelines

Create rich, up-to-date visualizations of your AWS infrastructure with Cloudcraft in Datadog

Jun 6, 2025 By Jace Harker In Datadog

As your cloud environment grows more complex and dynamic, it becomes more difficult to maintain up-to-date reference diagrams, visualizing its components, that are available to all teams. As a result, teams often end up lacking the visibility they need to understand, manage, and troubleshoot their cloud infrastructure and applications.

Read Post

Datadog

Read more about Create rich, up-to-date visualizations of your AWS infrastructure with Cloudcraft in Datadog

Announcing Go tracer v2.0.0

Jun 5, 2025 By Dario Castañé In Datadog

Datadog has long supported the monitoring of instrumented Go applications through our Go tracer v1. As the Go ecosystem has continued to mature, we’ve been hard at work collecting feedback and improving upon the tracer’s capabilities and usability features. We are now thrilled to announce the release of our Go tracer v2.0.0. This major update includes better security and stability, and a new and simplified API.

Read Post

Datadog

Read more about Announcing Go tracer v2.0.0

Monitor OpenTelemetry-native metrics with Datadog

Jun 3, 2025 By Shanel Huang In Datadog

OpenTelemetry (OTel) is emerging as the industry standard for collecting and transmitting observability data. Datadog supports several ways to send and accept OTel-native data, while also continuing to support its own native telemetry format. To provide a consistent monitoring experience, Datadog now supports using OTel-native metrics alongside Datadog-native metrics across dashboards, queries, and core visualizations in the Datadog platform.

Read Post

Datadog

Read more about Monitor OpenTelemetry-native metrics with Datadog

Best practices for end-to-end custom metrics governance

Jun 3, 2025 By Colten Woo In Datadog

Custom metrics enable you to track what matters to your distinct business and services and correlate it with the rest of your telemetry data. As your organization grows by adding more teams, services, and environments, your volume of custom metrics can grow with it. To ensure critical visibility while maintaining cost efficiency, organizations need an end-to-end approach to custom metrics governance.

Read Post

Datadog

Read more about Best practices for end-to-end custom metrics governance

Introducing RUM without Limits: Capture everything, keep what matters

Jun 2, 2025 By Bridgitte Kwong In Datadog

Real User Monitoring (RUM) helps teams understand exactly how their users experience their web and mobile applications—from load times to crashes and frustration signals. But traditional RUM models come with tough trade-offs: capture all sessions and overspend, or sample data and miss what matters. Fixed sampling rates may help manage volume, but they leave dangerous blind spots.

Read Post

Datadog

Read more about Introducing RUM without Limits: Capture everything, keep what matters

Operations | Monitoring | ITSM | DevOps | Cloud

Built for Engineers: Datadog's Vision for the Future

How we've created a successful FinOps practice at Datadog

Route your monitor alerts with Datadog monitor notification rules

Improve SLO accuracy and performance with Datadog Synthetic Monitoring

Trace Distributed Map states for AWS Step Functions with Datadog

Stay Compliant: Meet Your Audit Needs with Datadog!

How Cursor scaled infrastructure rapidly and reliably using Datadog

AI-Augmented Control Plane: Scaling IT Operations with Intelligent Automation

LLM Observability for Reliability and Stability: A Monitoring Strategy for Phone Communication

Datadog + OpenAI: Codex CLI integration for AIassisted DevOps

Introducing Bits AI SRE, your AI on-call teammate

Optimize and troubleshoot AI infrastructure with Datadog GPU Monitoring

Datadog MCP Server: Connect your AI agents to Datadog tools and context

DASH by Datadog 2025 Keynote

Automatically identify issues and generate fixes with Bits AI Dev

Improve performance and reliability with Proactive App Recommendations

Ensure trust across the entire data life cycle with Datadog Data Observability

Accelerate Oracle Cloud Infrastructure monitoring with Datadog OCI QuickStart

Create and monitor LLM experiments with Datadog

Migrate historical logs from Splunk and Elasticsearch using Observability Pipelines

Create rich, up-to-date visualizations of your AWS infrastructure with Cloudcraft in Datadog

Announcing Go tracer v2.0.0

Monitor OpenTelemetry-native metrics with Datadog

Best practices for end-to-end custom metrics governance

Introducing RUM without Limits: Capture everything, keep what matters

Monthly Archive

Follow Us