Operations | Monitoring | ITSM | DevOps | Cloud

Annotate traces to improve LLM quality with Datadog LLM Observability

LLM applications rarely crash. They degrade quietly. Once these applications are shipped to production, subtle quality failures become harder to catch with traditional signals. Tone shifts, hallucinated details, off-topic responses, and incomplete reasoning can emerge while latency and token usage look stable.

Why AI Driven Automation Can't Wait

Operators today are navigating unprecedented complexity—rising costs, accelerating customer expectations, and increasingly dynamic networks. In this recent video interview, my colleague Kevin Wade and I explore why AI‑driven automation has shifted from a “nice‑to‑have” technology to a core business requirement for telecom operators and beyond.

How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Chris Watts is Head of Enterprise Engineering at OpenRouter, building infrastructure for AI applications. Previously at Amazon and a startup founder. As large language models become core infrastructure for more and more applications, teams are discovering a familiar challenge in a new context: you can't improve what you can't see.

Introducing Calico Load Balancer and Seamless VM-to-Kubernetes Migration

SAN JOSE, Calif., March 23, 2026 — Tigera, the creator and maintainer of Project Calico, today announced a major expansion of its Unified Network Security Platform for Kubernetes, aimed at helping enterprises consolidate infrastructure and accelerate the migration of legacy workloads to cloud-native platforms.

The Hidden AI Bill: Why Non-Prod LLM Costs Spiral

Most teams know they are spending money on AI in production. Far fewer realize how much they are spending outside production. It’s easy to get lost as you evaluate which model has the best responses, is fast enough, and cheap enough to run in production. That is because the AI bill usually shows up as a giant blob. It is easy to see the total.

Harness AI for Argo CD

Managing GitOps at scale shouldn’t feel like an endless game of "Whac-A-Mole." In this 3-minute demo, we show how Harness AI moves beyond simple syncs to provide agentic troubleshooting and automated orchestration for your entire GitOps estate. Watch as we use the Harness DevOps Agent to: Identify Common Failure Patterns: Instead of clicking through individual clusters, we ask the AI to analyze 4 out-of-sync applications simultaneously.

FinOps Leaders Who Will Win The AI Era Are Already Experimenting

Engineering teams are shipping faster than ever. AI coding tools like Claude Code and OpenAI’s Codex have quietly removed some of the biggest friction points in the development cycle — and the result is that FinOps teams are being asked to keep up with a pace most practitioners haven’t fully reckoned with yet. That acceleration has a cost consequence. More shipping means more services, more experiments, more infrastructure spun up without review cycles.