Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on APIs, Mobile, AI, Machine Learning, IoT, Open Source and more!

Reading the agent traces is how you make the call your eval can't

Remember being excited (or dreading, depending on the stage of your career and the company you worked at) about writing unit tests? Or sweating all the details in your end-to-end and integration tests you were sure covered all the use cases your users would hit? These days a lot of UIs are slowly being replaced by a single input field and an agent that promises to deliver the same value a UI would, but with the elegance and pun-ness of a “Jarvis”.

AI Tool Sprawl Is Killing Enterprise ROI | Why Orchestration Matters More Than AI Features

Enterprise AI adoption is accelerating, but are organizations actually solving business problems or just adding more tools? In this episode of Agents of IT, Fran Fernandez (Chief Product Officer at Resolve) and Zach Austin (Director of Product Marketing) explore one of the biggest challenges facing enterprise IT in 2026: AI tool sprawl. They discuss why many organizations struggle to demonstrate ROI from AI investments, how disconnected AI assistants create operational complexity, and why orchestration, automation, and context have become the real differentiators for enterprise AI success.

Shipped: Turn your Bifrost gateway into an AI spend meter

If you route model traffic through Bifrost, you already have the hard part: one place every AI call passes through, where the model, the tokens, and the cost are visible on the way past. It’s the cheapest spot in your stack to measure AI spend. What’s missing is everything downstream – today that usage only becomes “spend” weeks later, when the provider invoice lands as a lump sum you can’t break apart.

Don't 'control' your AI spend. Understand it and be intentional.

There’s a good interview making the rounds. BizTech sat down with IBM’s James Stevenson to talk about how financial institutions can get a handle on cloud and AI costs. The advice is solid: get visibility, kill idle resources, tighten governance, tag everything. And pull finance and engineering into the same room. I don’t disagree with it. But I read the whole piece and noticed where the gravity pulls: control costs, reduce waste, bring down spend. The headline says it (‘Q&A.

Accelerate investigations with AI in Datadog Incident Response

Engineering teams spend much of their incident response time investigating the problem and coordinating the response. Both tasks become harder when telemetry data lives in one place, deployment history is stored in another, and conversations unfold across chat channels and incident bridges. Responders often spend the first part of an incident rebuilding context before they can begin testing hypotheses and working toward resolution.

How Datadog uses AI to build internal software delivery tools and improve system performance

At Datadog, we want our developers to become better at using AI tools with the end goal of building quality software, faster, that generates real value. This includes not only the products and features that our customers use, but also the internal tools that help keep our workflows running smoothly behind the scenes.

The Next Enterprise AI Challenge: The Multi-Model Workplace

For the last two years, enterprise AI strategy has largely focused on one thing: adoption. Organizations encouraged employees to experiment with ChatGPT, Claude, Copilot, Gemini, and dozens of emerging AI tools in the hope that productivity gains would naturally follow. CIOs approved pilots, departments launched AI task forces, and leaders pushed teams to integrate AI into everyday work as quickly as possible. But the enterprise AI conversation is beginning to change.

AI Orchestrations: Your easy button for proactive operations

This blog post is part of PagerDuty’s ongoing series on how we’re helping customers navigate their journey towards autonomous operations. Read on to learn about how AI Orchestrations builds towards this vision. “We should automate this.” Sound familiar? For many operations teams, that sentence never becomes action. Building event orchestration rules demands deep platform expertise, time no one has, and the ability to spot which patterns in your data actually matter.

PagerDuty agent app in GitHub: incident context where you already work

This blog post is part of PagerDuty’s ongoing series on how we’re helping customers navigate their journey toward autonomous operations. Read on to learn about the PagerDuty agent app in GitHub (Early Access) and how it builds toward this vision. How many tabs do you have open right now? And how many more do you open the moment an incident hits? Context switching during incident response is one of the most persistent sources of toil in engineering.