Operations | Monitoring | ITSM | DevOps | Cloud

Reading the agent traces is how you make the call your eval can't

Remember being excited (or dreading, depending on the stage of your career and the company you worked at) about writing unit tests? Or sweating all the details in your end-to-end and integration tests you were sure covered all the use cases your users would hit? These days a lot of UIs are slowly being replaced by a single input field and an agent that promises to deliver the same value a UI would, but with the elegance and pun-ness of a “Jarvis”.

Shipped: Turn your Bifrost gateway into an AI spend meter

If you route model traffic through Bifrost, you already have the hard part: one place every AI call passes through, where the model, the tokens, and the cost are visible on the way past. It’s the cheapest spot in your stack to measure AI spend. What’s missing is everything downstream – today that usage only becomes “spend” weeks later, when the provider invoice lands as a lump sum you can’t break apart.

Don't 'control' your AI spend. Understand it and be intentional.

There’s a good interview making the rounds. BizTech sat down with IBM’s James Stevenson to talk about how financial institutions can get a handle on cloud and AI costs. The advice is solid: get visibility, kill idle resources, tighten governance, tag everything. And pull finance and engineering into the same room. I don’t disagree with it. But I read the whole piece and noticed where the gravity pulls: control costs, reduce waste, bring down spend. The headline says it (‘Q&A.

Accelerate investigations with AI in Datadog Incident Response

Engineering teams spend much of their incident response time investigating the problem and coordinating the response. Both tasks become harder when telemetry data lives in one place, deployment history is stored in another, and conversations unfold across chat channels and incident bridges. Responders often spend the first part of an incident rebuilding context before they can begin testing hypotheses and working toward resolution.

How Datadog uses AI to build internal software delivery tools and improve system performance

At Datadog, we want our developers to become better at using AI tools with the end goal of building quality software, faster, that generates real value. This includes not only the products and features that our customers use, but also the internal tools that help keep our workflows running smoothly behind the scenes.

What the World Cup Looks Like in Internet Traffic

The World Cup may be the most-watched event in media history — so what does it look like from inside the network? We dug into ISP traffic data to reveal how Fox Sports peaks during US games, why second halves usually win, and how traffic flows shift for entire nations like Brazil and Iran when their team takes the field.

What's New in InfluxDB and Telegraf: Q2 2026 Product Updates

Summary: Q2 was about giving teams more leverage with less overhead. Between April and June 2026, releases across Telegraf, InfluxDB 3, and InfluxDB 3 Explorer focused on reducing manual work and putting more control directly in their hands as they scale. Telegraf Enterprise reached general availability, giving teams a centralized way to manage, monitor, and support tens of thousands of Telegraf agents.

The Next Enterprise AI Challenge: The Multi-Model Workplace

For the last two years, enterprise AI strategy has largely focused on one thing: adoption. Organizations encouraged employees to experiment with ChatGPT, Claude, Copilot, Gemini, and dozens of emerging AI tools in the hope that productivity gains would naturally follow. CIOs approved pilots, departments launched AI task forces, and leaders pushed teams to integrate AI into everyday work as quickly as possible. But the enterprise AI conversation is beginning to change.

Rundeck/RBA 6.0: Modernizing the foundation your automation runs on

This blog post is part of PagerDuty’s ongoing series on how we’re helping customers navigate their journey towards autonomous operations. Read on to learn about how PagerDuty’s Rundeck/RBA 6.0 recently announced in GA builds towards this vision.

AI Orchestrations: Your easy button for proactive operations

This blog post is part of PagerDuty’s ongoing series on how we’re helping customers navigate their journey towards autonomous operations. Read on to learn about how AI Orchestrations builds towards this vision. “We should automate this.” Sound familiar? For many operations teams, that sentence never becomes action. Building event orchestration rules demands deep platform expertise, time no one has, and the ability to spot which patterns in your data actually matter.