Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on APIs, Mobile, AI, Machine Learning, IoT, Open Source and more!

The Journey to Achieving Hyperscale Availability with AI-Driven Prediction

At hyperscale, a regional cloud outage is not merely a technical disruption—for Samsung Account, which serves 2.1 billion users across three global regions, it is an immediate global service crisis. Fragmented, region-siloed monitoring creates blind spots that make early detection nearly impossible, leaving SRE teams perpetually reactive rather than predictive. The path to proactive reliability requires both a philosophical shift and a foundational change in how observability data is collected, unified, and reasoned over.

Building an AI Ready Data Backbone: Dima Kan at AICamp 2026

The Aiven Platform is more than a collection of open source services for streaming, storing and analyzing data. The platform ensures that all services run reliably and securely in the clouds of your choice, are observable, and can easily be integrated with each other and with external 3rd party tools.

Debug and evaluate your AI app from your coding agent with Datadog Agent Observability

Coding agents like Claude Code, Cursor, and Codex CLI handle the coding parts of building an AI application well. The harder work comes after: understanding why a response went wrong, building eval sets that reflect real production behavior, and keeping up with an application that changes faster than any one-off script can. Teams spend 60–80% of their time on evaluation and error analysis, and much of that work needs to be redone every time the stack shifts.

5 pitfalls to avoid when measuring DevEx in the AI era

Developer experience, commonly known as DevEx, describes how an organization’s systems, workflows, tools, and culture affect developer productivity. A positive DevEx leads to tangible organizational benefits, including faster releases, increased innovation, and reduced technical debt. Measuring DevEx enables engineering management to quantify their team’s impact and understand where to direct improvement efforts.

Datadog acquires Adaptive ML

Off-the-shelf models are easy to deploy, but they are rarely enough to solve complex, domain-specific challenges in production. The key to sustained AI value is not in the models themselves but in the ability to tune, evaluate, and refine those models against your organization’s real-time signals. We are excited to announce that Adaptive ML is joining Datadog to accelerate this vision by combining our deep observability data with their expertise in building specialized, high-performance AI agents.

What Customers Are Doing With AI and Honeycomb

At O11yCon, we talked to engineering teams across the industry, and the numbers are starting to get genuinely wild: Mixpanel DevOps Engineer Eddie Bracho told us their engineering team is generating 50% more PRs than before AI came into the mix (sorry). That kind of velocity is exciting, but it's also a pressure test for every part of your stack that isn't writing code, including your observability practice. Here's what we're hearing from customers about how that's playing out.

Shipped: LiteLLM is probably under-counting your Claude spend

If you run Claude through LiteLLM, some of that spend is probably going uncounted – and you can’t see it, precisely because the data isn’t there. Routing through a gateway is messier than it looks: LiteLLM alone can carry Claude several ways – the OpenAI-compatible endpoint, and the Anthropic pass-through proxy that the native SDK and Claude Code use – and each path describes the same call differently.

AI ROI Dispatches: How a non-engineer solved a $300K problem for under $1K

A year ago, the sentence “I just deployed an app on GitHub” wouldn’t have made sense coming from me. I’m the VP of People at CloudZero; code deployments and I were not close friends. That’s changed. In this AI era, non-engineers are building, and I think that’s a genuinely good thing. But only if it’s tied to something that matters.