|
By Jonathan Morin
Teams are wiring AI coding agents straight to their warehouse over MCP and asking things like “What was our revenue by channel in Q2?” The agent finds a revenue table, runs a query, and returns a number in seconds, with no waiting on the data team. While the answer initially looks right, the problem is that the number is often wrong.
|
By Samantha Scaglione
Monitoring is built around the system a team understands at a point in time. Engineers add endpoints, move dependencies, and change user flows every day. Over time, that creates coverage drift as monitors keep reflecting the system as it used to behave, while changing paths introduce failure modes that teams didn’t yet know to watch for. Bits Detection automatically creates, tunes, and maintains monitors for your services.
|
By Datadog
A challenge for many teams continues to be managing cost, governance, and reliability across an ever-larger footprint. This year’s DASH announcements help teams operate efficiently at scale, with new tools to cut cloud and AI spend, eliminate waste automatically, maintain observability during outages, and manage many organizations and agents as a single unit.
|
By Nicole Parisi
Finding the right information across dashboards, monitors, and telemetry sources takes time, even for experienced engineers. When something breaks, it often means figuring out where to start, rebuilding queries, and jumping between metrics, logs, and traces before you can take action. The challenge isn’t a lack of data but the effort required to surface the right information at the right moment.
|
By Cody Lee
AI agents are becoming a standard part of how engineers write, deploy, and troubleshoot software. Getting observability data into those workflows, securely and without manual intervention, remains the harder problem.
|
By Amber Tunnell
Building automated workflows that adapt to real-world complexity can be a challenge. As systems scale and scenarios multiply, teams often end up hardcoding endless logic branches just to handle every potential outcome. That’s why we’re introducing Bits Agent Builder, a powerful new tool that lets you create custom AI agents that are fully hosted by Datadog.
|
By Michael Cronk
Azure Managed Redis is a Microsoft first-party, fully managed in-memory data store, replacing Azure Cache for Redis tiers. It includes Redis Enterprise features such as RediSearch for vector search and full-text search, in addition to RedisJSON, RedisTimeSeries, and Active Geo-Replication. As Azure Cache for Redis reaches end of life, more teams are planning migrations to Azure Managed Redis in search of better performance, lower cost, and modern capabilities for AI and real-time workloads.
|
By Charles Yu
Spark jobs only get more expensive and harder to debug as they scale. It’s a problem we’ve run into ourselves. Our Referential Data Platform team builds and maintains the knowledge graph that maps relationships between customers’ observability entities. ServiceQueryEdge is at the center of that graph, mapping service entities to their associated metric and log queries.
|
By Mallory Mooney
In AWS environments, a data perimeter is a set of preventative controls that help ensure that your trusted cloud identities (principals or AWS services acting on your behalf) are accessing trusted resources from authorized networks. You can apply these controls at various levels of your infrastructure, such as per resource or across all resources in your AWS account.
|
By David Lentz
If you serve LLMs on Kubernetes without inference-aware routing, your load balancer is likely wasting inference capacity. Generic HTTP traffic management blindly routes requests, assuming the backends in your cluster are interchangeable. But your model-serving backends are stateful and unevenly prepared to handle any given request. As a result, requests are often routed to the backend that’s not the one best suited to respond.
|
By Datadog
At, Datadog launched 100+ capabilities to help customers drive autonomy and manage growing AI and security complexity. From new Bits AI, log management, and security capabilities, customers have the visibility and autonomous operations they need to detect, investigate and resolve issues across the development loop and data lifecycle. Tune in to the full keynote to catch the highlights.
|
By Datadog
In this video, you'll learn how Datadog GPU Monitoring gives ML and platform teams a single view of their GPU fleet, so they can see what's slowing down their AI workloads, fix issues faster, and use the GPUs they already have more efficiently.
|
By Datadog
"We're doing cutting-edge AI, focused on real translational impact: getting our research over the wall and into production." Ameet Talwalkar, Datadog's Chief Scientist, shares what it took to build the AI Research Lab from the ground up — and what makes DAIR different from traditional research teams. At Datadog, research ships. Recent work from the lab includes Toto 2.0, open-weights time series forecasting models ranked on leading benchmarks, and ARFBench, a new benchmark for evaluating AI on real incident data.
|
By Datadog
Datadog has always been driven by a broader vision of helping teams understand and operate complex systems. In this session, you’ll hear from Michael Whetten, Product SVP, and Abrar Hussain, Senior Director, Product Management, as they share the latest updates across the Datadog product suite and discuss how that vision continues to shape the platform’s evolution and support the next generation of AI-driven applications.
|
By Datadog
In the fast-paced world of mobile development, reliability rarely fails with a loud crash; instead, it degrades quietly through micro-regressions that erode user trust and engagement. While most companies track backend health and API latency, they often fly blind regarding the actual screen-level responsiveness that defines the true user experience. When Expedia Group underwent a major technical evolution, the team realized they lacked a consistent baseline to compare performance across platforms, leaving them unable to validate improvements before rollout.
|
By Datadog
You’re told to “go build agents” without clear guidance on what that actually means, how to do it well, or how to know if it is working. You are not a data scientist. You are a software engineer. In this talk, a Datadog AI product leader Shri Subramanian breaks down what changes when you move from building applications to building AI agents, and why familiar approaches like traditional testing and linear delivery fall short. We will explore how agent development shifts the focus from code alone to data, prompts, and evaluation, and why functional reliability matters just as much as operational reliability.
|
By Datadog
Join Datadog CPO Yanbing Li and a special guest as they discuss emerging technologies and innovation, how they impact businesses today, and the new opportunities they create for you.
|
By Datadog
Delivering great products to your customers requires a mix of evolution and consistency. To really land with users your product has to be ready to adapt and scale, prioritizing across a mix of customer and business needs. Join experts in reliability, systems engineering, and DevOps as they share real-world examples, true stories of pitfalls, and astounding impact from the experiments they have run. Learn how experienced practitioners handle failure, adapt to scale, and bridge gaps between teams to improve software performance and customer outcomes.
|
By Datadog
When stakeholders push for faster growth (new markets, new features, newly modernized stack) your engineering model has to change too. At FitnessPassport, the shift from offshore waterfall delivery to an in-house team meant rebuilding not just services, but confidence: legacy systems with weak logging and little visibility made it hard to know whether changes were working and impossible to spot issues before users did. In this talk, Director of Engineering Rob Mitchell will share how FitnessPassport adopted Datadog and used structured logs, metrics, and traces to tighten feedback loops.
|
By Datadog
Platform teams often end up as the bottleneck for “small” operational asks: add a new button, wire up a workflow, expose one more cloud capability—each change requiring engineering time, reviews, and releases. In this technical deep dive, engineers from the Department of Government Services (Victoria) share the architecture and open source CDK library behind their “Infrastructure Control Panel”: a modular operational enablement app that lets non-technical users interact safely with cloud resources through strong access controls.
|
By Datadog
As Docker adoption continues to rise, many organizations have turned to orchestration platforms like ECS and Kubernetes to manage large numbers of ephemeral containers. Thousands of companies use Datadog to monitor millions of containers, which enables us to identify trends in real-world orchestration usage. We're excited to share 8 key findings of our research.
|
By Datadog
The elasticity and nearly infinite scalability of the cloud have transformed IT infrastructure. Modern infrastructure is now made up of constantly changing, often short-lived VMs or containers. This has elevated the need for new methods and new tools for monitoring. In this eBook, we outline an effective framework for monitoring modern infrastructure and applications, however large or dynamic they may be.
|
By Datadog
Where does Docker adoption currently stand and how has it changed? With thousands of companies using Datadog to track their infrastructure, we can see software trends emerging in real time. We're excited to share what we can see about true Docker adoption.
|
By Datadog
Build an effective framework for monitoring AWS infrastructure and applications, however large or dynamic they may be. The elasticity and nearly infinite scalability of the AWS cloud have transformed IT infrastructure. Modern infrastructure is now made up of constantly changing, often short-lived components. This has elevated the need for new methods and new tools for monitoring.
|
By Datadog
Like a car, Elasticsearch was designed to allow you to get up and running quickly, without having to understand all of its inner workings. However, it's only a matter of time before you run into engine trouble here or there. This guide explains how to address five common Elasticsearch challenges.
|
By Datadog
Monitoring Kubernetes requires you to rethink your monitoring strategies, especially if you are used to monitoring traditional hosts such as VMs or physical machines. This guide prepares you to effectively approach Kubernetes monitoring in light of its significant operational differences.
- June 2026 (10)
- May 2026 (27)
- April 2026 (26)
- March 2026 (36)
- February 2026 (20)
- January 2026 (17)
- December 2025 (36)
- November 2025 (33)
- October 2025 (27)
- September 2025 (19)
- August 2025 (24)
- July 2025 (30)
- June 2025 (25)
- May 2025 (20)
- April 2025 (15)
- March 2025 (16)
- February 2025 (16)
- January 2025 (29)
- December 2024 (23)
- November 2024 (28)
- October 2024 (15)
- September 2024 (15)
- August 2024 (10)
- July 2024 (15)
- June 2024 (26)
- May 2024 (12)
- April 2024 (19)
- March 2024 (11)
- February 2024 (21)
- January 2024 (19)
- December 2023 (18)
- November 2023 (22)
- October 2023 (15)
- September 2023 (14)
- August 2023 (28)
- July 2023 (15)
- June 2023 (17)
- May 2023 (22)
- April 2023 (13)
- March 2023 (22)
- February 2023 (12)
- January 2023 (8)
- December 2022 (9)
- November 2022 (27)
- October 2022 (22)
- September 2022 (14)
- August 2022 (22)
- July 2022 (13)
- June 2022 (13)
- May 2022 (18)
- April 2022 (14)
- March 2022 (6)
- February 2022 (14)
- January 2022 (17)
- December 2021 (9)
- November 2021 (16)
- October 2021 (26)
- September 2021 (8)
- August 2021 (18)
- July 2021 (15)
- June 2021 (16)
- May 2021 (23)
- April 2021 (20)
- March 2021 (16)
- February 2021 (9)
- January 2021 (10)
- December 2020 (22)
- November 2020 (17)
- October 2020 (12)
- September 2020 (15)
- August 2020 (22)
- July 2020 (20)
- June 2020 (14)
- May 2020 (18)
- April 2020 (24)
- March 2020 (13)
- February 2020 (13)
- January 2020 (11)
- December 2019 (16)
- November 2019 (11)
- October 2019 (11)
- September 2019 (11)
- August 2019 (16)
- July 2019 (18)
- June 2019 (11)
- May 2019 (12)
- April 2019 (20)
- March 2019 (10)
- February 2019 (9)
- January 2019 (6)
- December 2018 (7)
- November 2018 (7)
- October 2018 (13)
- September 2018 (5)
- August 2018 (12)
- July 2018 (12)
- June 2018 (6)
- March 2018 (1)
- December 2017 (1)
- November 2017 (1)
- March 2015 (1)
Datadog is the essential monitoring platform for cloud applications. We bring together data from servers, containers, databases, and third-party services to make your stack entirely observable. These capabilities help DevOps teams avoid downtime, resolve performance issues, and ensure customers are getting the best user experience.
See it all in one place:
- See across systems, apps, and services: With turn-key integrations, Datadog seamlessly aggregates metrics and events across the full devops stack.
- Get full visibility into modern applications: Monitor, troubleshoot, and optimize application performance.
- Analyze and explore log data in context: Quickly search, filter, and analyze your logs for troubleshooting and open-ended exploration of your data.
- Build real-time interactive dashboards: More than summary dashboards, Datadog offers all high-resolution metrics and events for manipulation and graphing.
- Get alerted on critical issues: Datadog notifies you of performance problems, whether they affect a single host or a massive cluster.
Modern monitoring & analytics. See inside any stack, any app, at any scale, anywhere.