Operations | Monitoring | ITSM | DevOps | Cloud

Evaluating our AI Guard application to improve quality and control cost

This article is part of our series on how Datadog’s engineering teams use LLM Observability to build, monitor, and improve AI-powered systems. Organizations are building AI agents that help users automate work, analyze data, and interact with complex systems through natural language. As these agents become more capable, they also become more complex and exposed to risks such as prompt injection, data leaks, and unsafe code execution.

Identify untested code across every level of your codebase

As organizations scale their services and adopt AI-assisted coding, code changes are landing faster and in greater volume than ever before. While this powerful new practice is accelerating the pace of development, it is also increasing the likelihood that untested code may slip into repositories without detection. What makes this problem even worse is that most teams have no reliable way to know which code is covered by tests.

Make use of guardrail metrics and stop babysitting your releases

Modern CI/CD pipelines have automated the hard work of building, testing, and deploying our code. But for many teams, that’s where the automation stops. The most critical part of a release, turning a new feature on for real users, is still a stressful, manual process. An engineer cautiously ramps up traffic to 5%, then 10%. The whole team stares at dashboards, trying to see if anything breaks. If something does, they scramble to manually roll back.

Improve performance and reliability with APM Recommendations

SREs and application developers rely on telemetry data to understand and improve their systems. As organizations scale and evolve, those systems generate an ever-growing volume of metrics, logs, and traces. But more data alone does not make it easier to improve performance or reliability: Identifying meaningful optimizations still requires careful investigation and analysis.

Monitor Fortinet FortiManager performance in Datadog

As enterprises scale, teams often find it harder to identify user-reported issues. Software-defined wide area networks (SD-WANs) can make it easier to add branch offices, but they can also make it more challenging to distinguish connectivity degradation from changes in application behavior. FortiManager provides a centralized control plane for Fortinet Secure SD-WAN and reduces operational complexity.

Improve test coverage across codebases with Datadog Code Coverage

As codebases grow across many different services, it becomes harder to see what test suites actually cover. AI-assisted development and faster release cycles increase the volume of changes landing in repositories, raising the risk that untested code will make it through to production. To maintain a high standard, teams need clear and scalable visibility across repositories, consistent testing standards, and a way to catch blind spots before they reach users.

Move fast, don't break things: Consistent testing standards at scale

Moving quickly is essential for modern engineering teams, but speed without guardrails can introduce hidden risks in testing. As organizations scale, teams often define and apply coverage standards inconsistently across services and repositories. What qualifies as “acceptable coverage” in one project may be completely different in another. Without automated enforcement, untested code can slip through reviews.

Surface and remediate runtime posture issues with Workload Protection Findings

Threat detection and runtime posture monitoring are related but different jobs. Security teams already rely on Datadog Workload Protection to detect threats in real time across hosts and containers. But the actions that lead to those detections (file manipulation, process execution, network calls, or kernel activity) can be indicative of compromise or simply of risky behavior—like running compilers in production containers.