%term

Route your monitor alerts with Datadog monitor notification rules

Jun 27, 2025 By Khang Truong In Datadog

As organizations scale their infrastructure, monitoring systems can become a source of noise rather than insight. A clean, straightforward set of alerts for a handful of services can quickly spiral into a mess of overlapping thresholds, redundant triggers, and inconsequential notifications across hundreds (or thousands) of components. This flood of notifications can slow response times, overwhelm engineers, and increase the chance of overlooking critical problems.

Read Post

Datadog

Read more about Route your monitor alerts with Datadog monitor notification rules

Improve SLO accuracy and performance with Datadog Synthetic Monitoring

Jun 26, 2025 By Addie Beach In Datadog

SLOs are key for improving user satisfaction, prioritizing engineering projects, and measuring overall performance. Given the important role that SLOs play in determining organizational benchmarks, teams need to ensure that SLO metrics—also called service level indicators (SLIs)—are reported accurately and maintained consistently within an acceptable range.

Read Post

Datadog

Read more about Improve SLO accuracy and performance with Datadog Synthetic Monitoring

Trace Distributed Map states for AWS Step Functions with Datadog

Jun 25, 2025 By Abhinav Vedmala In Datadog

AWS Step Functions offers the Distributed Map state, enabling you to coordinate massively parallel workloads within your serverless applications. With this feature, a single Step Functions execution can fan out into up to 10,000 parallel workflows simultaneously, making it possible to efficiently process millions of items in parallel. This capability unlocks new possibilities for large-scale data processing, such as image transformation, log ingestion, or batch analytics.

Read Post

Datadog

Read more about Trace Distributed Map states for AWS Step Functions with Datadog

Stay Compliant: Meet Your Audit Needs with Datadog!

Jun 24, 2025 By Datadog In Datadog

Datadog's internal compliance team has built audit workflows and control monitoring capabilities using the Datadog platform. We actively use these capabilities to scale our audit programs and comply with multiple compliance frameworks. This session will go into the details of how we addressed our compliance use-cases using the Datadog platform and how our customers can get started.

View Video

Datadog

Read more about Stay Compliant: Meet Your Audit Needs with Datadog!

How Cursor scaled infrastructure rapidly and reliably using Datadog

Jun 24, 2025 By Datadog In Datadog

At Datadog, we use Cursor to empower our teams to build more quickly. And we know that building and troubleshooting with AI tools like Cursor is done best with the right observability data and context. Discover how Cursor was able to rapidly and reliably scale their infrastructure 100x using Datadog to meet the needs of a fast growing user base. And learn more about how we’re bring Datadog tools and context to your favorite AI IDEs and agents with our MCP Server and extensions.

View Video

Datadog

Read more about How Cursor scaled infrastructure rapidly and reliably using Datadog

AI-Augmented Control Plane: Scaling IT Operations with Intelligent Automation

Jun 23, 2025 By Datadog In Datadog

How do you enable a team of 100 engineers to effectively support 300+ critical applications across five hosting platforms? At Thomson Reuters, we turned to AI - not as a buzzword, but as a genuine force multiplier. Experience our journey of transforming traditional IT operations into an AI-augmented powerhouse, where Datadog, ServiceNow, and custom AI solutions work in harmony to create a next-generation control plane. We'll share real victories, honest challenges, and practical insights from our mission to build a more intelligent operational framework.

View Video

Datadog

Read more about AI-Augmented Control Plane: Scaling IT Operations with Intelligent Automation

LLM Observability for Reliability and Stability: A Monitoring Strategy for Phone Communication

Jun 18, 2025 By Datadog In Datadog

LLM APIs offer groundbreaking potential, but also present challenges such as response latency, hallucinations, and service instability. In Japan, where telephone communication remains crucial for business, these issues present significant barriers to the introduction of LLM-based applications. Despite being a relatively young startup, we have developed and deployed an LLM-based telephone service with over 40 million calls.

View Video

Datadog

Read more about LLM Observability for Reliability and Stability: A Monitoring Strategy for Phone Communication

Datadog + OpenAI: Codex CLI integration for AIassisted DevOps

Jun 12, 2025 By Reilly Wood In Datadog

We are exploring how we can help on-call engineers troubleshoot incidents more effectively by providing the OpenAI Codex agent with access to real-time observability data in terminals. We've developed an integration and new tool visualizations that connect OpenAI's Codex CLI to the new Datadog MCP server. In this post, we'll share what we've been experimenting with: enabling an AI agent to retrieve production metrics, logs, and incidents from Datadog in real time and act on that context.

Read Post

Datadog

Read more about Datadog + OpenAI: Codex CLI integration for AIassisted DevOps

Optimize and troubleshoot AI infrastructure with Datadog GPU Monitoring

Jun 10, 2025 By Anjali Thatte In Datadog

As organizations bring more AI and LLM workloads into production, the underlying GPU infrastructure that supports these workloads becomes even more critical in ensuring these workloads remain fast, reliable, and scalable. Inefficient GPU resource usage, for instance, can lead to longer runtimes and reduced throughput, negatively impacting overall model performance. Additionally, idle and underutilized GPUs can quickly drive up costs and lead to needless spending.

Read Post

Datadog

Read more about Optimize and troubleshoot AI infrastructure with Datadog GPU Monitoring

Datadog MCP Server: Connect your AI agents to Datadog tools and context

Jun 10, 2025 By Bowen Chen In Datadog

As development teams adopt AI-powered tools and build services that make use of AI agents, they want to extend their AI capabilities to incorporate familiar tools and observability data. However, AI agents struggle with regular API endpoints and frequently fail when parsing complex nested JSON hierarchies or incorrectly handling errors. As a result, these agents often fail to retrieve relevant results.

Read Post

Datadog

Read more about Datadog MCP Server: Connect your AI agents to Datadog tools and context

Operations | Monitoring | ITSM | DevOps | Cloud

Route your monitor alerts with Datadog monitor notification rules

Improve SLO accuracy and performance with Datadog Synthetic Monitoring

Trace Distributed Map states for AWS Step Functions with Datadog

Stay Compliant: Meet Your Audit Needs with Datadog!

How Cursor scaled infrastructure rapidly and reliably using Datadog

AI-Augmented Control Plane: Scaling IT Operations with Intelligent Automation

LLM Observability for Reliability and Stability: A Monitoring Strategy for Phone Communication

Datadog + OpenAI: Codex CLI integration for AIassisted DevOps

Optimize and troubleshoot AI infrastructure with Datadog GPU Monitoring

Datadog MCP Server: Connect your AI agents to Datadog tools and context

Monthly Archive

Follow Us