Operations | Monitoring | ITSM | DevOps | Cloud

Practical AI-Enabled Observability for Agents and LLMs

You’re told to “go build agents” without clear guidance on what that actually means, how to do it well, or how to know if it is working. You are not a data scientist. You are a software engineer. In this talk, a Datadog AI product leader Shri Subramanian breaks down what changes when you move from building applications to building AI agents, and why familiar approaches like traditional testing and linear delivery fall short. We will explore how agent development shifts the focus from code alone to data, prompts, and evaluation, and why functional reliability matters just as much as operational reliability.

Updated Web Management Console Demo | On-Call Management, Hospital Communication & Call Routing

See the next-generation OnPage Enterprise Web Management Console in action, built to simplify on-call scheduling, incident alerting, critical communication workflows and post-event reporting. In this demo, we walk through how teams can: Manage on-call schedules and escalation pathsSend and track critical alerts in real timeGain visibility into alert activity, read rates, and response timelinesConfigure contact groups and communication workflowsUse the new Lines Management module to set up call routing, menus, and rules through a self-service interface.

Resolve Reels - Ep. 2 - Scheduled Jobs Dashboard LI

Episode 2 of Resolve Reels is here. In this walkthrough, we introduce the new Scheduled Workflows Dashboard in Resolve Actions. Get a centralized view of every scheduled automation across your environment. Track execution status, monitor success and failure rates, and quickly drill into workflow performance. See how teams can: This is how modern IT teams move from reactive oversight to proactive control. With Resolve, automation is not just executed. It is continuously monitored, optimized, and scaled.

From Manual Requests to SelfServe: Building an AccessControlled App that Adapts Automatically

Platform teams often end up as the bottleneck for “small” operational asks: add a new button, wire up a workflow, expose one more cloud capability—each change requiring engineering time, reviews, and releases. In this technical deep dive, engineers from the Department of Government Services (Victoria) share the architecture and open source CDK library behind their “Infrastructure Control Panel”: a modular operational enablement app that lets non-technical users interact safely with cloud resources through strong access controls.

We Know Before it Breaks: Observability-Driven Development

When stakeholders push for faster growth (new markets, new features, newly modernized stack) your engineering model has to change too. At FitnessPassport, the shift from offshore waterfall delivery to an in-house team meant rebuilding not just services, but confidence: legacy systems with weak logging and little visibility made it hard to know whether changes were working and impossible to spot issues before users did. In this talk, Director of Engineering Rob Mitchell will share how FitnessPassport adopted Datadog and used structured logs, metrics, and traces to tighten feedback loops.

End to End Reliability for all your Workloads

Delivering great products to your customers requires a mix of evolution and consistency. To really land with users your product has to be ready to adapt and scale, prioritizing across a mix of customer and business needs. Join experts in reliability, systems engineering, and DevOps as they share real-world examples, true stories of pitfalls, and astounding impact from the experiments they have run. Learn how experienced practitioners handle failure, adapt to scale, and bridge gaps between teams to improve software performance and customer outcomes.

Performance Testing vs Load Testing: Simple Difference

Learn the clear difference between performance testing and load testing in this quick video. Performance testing checks how well your software works under different conditions like speed, stability, and scalability. Load testing focuses only on how the system handles expected user traffic. If you want to build reliable applications, knowing these two helps you test smarter. Perfect for developers, testers, and QA teams.