Operations | Monitoring | ITSM | DevOps | Cloud

From FinOps for AI to AI-Native FinOps

One year ago, at AWS re:Invent, we launched CloudZero Advisor, a free, standalone AI assistant that enables anyone to ask questions about cloud spend in plain language. It was the first experiment of its kind in FinOps, a chance to see what people really wanted to know when cost data finally became conversational. Over the past year, Advisor has become a learning engine.

All Is Calm, All Is Compliant: Staying Audit-Ready Through the Year-End Rush

As the year winds down, I find that most cybersecurity and compliance teams are focused on closing projects, hitting targets, and maybe even planning a well-earned break. But regulators? They don’t take holidays. FCA, PRA, GDPR – they remain vigilant, and so should you. For IT leaders, this season often feels like walking a tightrope: balancing operational demands with the relentless need for compliance.

Grafana Service Center: Simplify Service Reliability in One Place

Grafana Service Center gives engineers and stakeholders a single place to ensure service reliability. In this video, Staff Product Manager Ryan Kehoe walks through how Service Center ties together alerts, SLOs, dashboards, incidents, and metadata for each service. Learn how to centralize reviews, speed up investigations, and improve visibility across your teams—all within Grafana Cloud.

AI Infrastructure Is Creating a New Wave of Incidents, And Why Enterprises Need a Modern On-Call Strategy

Over the last few years, AI has quietly shifted from a fascinating experiment to a core operational system. Enterprises aren’t just building prototypes anymore — they’re deploying LLMs into production environments where uptime directly affects customer interactions, revenue flows, and business continuity. AI has essentially become a new layer of critical infrastructure. Because of that shift, the definition of “reliability” is changing.

KubeCon NA 2025: Universal Mesh, federation, and the end of the "mesh tax"

At KubeCon, we asked a simple question at our booth: "How much is your service mesh costing you?" The answers were eye-opening. Engineers shared stories of 40% resource overhead, multi-second latency spikes during peak traffic, and infrastructure bills that had nearly doubled since mesh adoption. One architect told us they were spending more time managing their mesh than building features.

Improve service reliability and ops culture with Grafana Cloud Service Center

Today’s engineering organizations are built around service ownership. Service owners are accountable for keeping their services reliable, performant, and ready to scale. But no service operates in isolation; every team depends on others, and those dependencies form a complex web that can be hard to see, let alone understand. To truly deliver reliable systems, you need visibility not only into how your own service performs, but also how it affects others.

What's Next for NaaS? Top Trends for 2026

Learn how private connectivity, regional hubs, and AI-driven automation are defining the next evolution of enterprise networking in 2026. 2026 is shaping up to be a big year for networking. We’re moving past the ideas of being simply connected – now, networks are becoming intelligent. As we see our customers lean into AI, multicloud, and automation in every corner of their operations, the way they connect everything is changing just as fast.

AI Agent for Business SLA Predictions: Safeguarding Business Continuity with Predictive Intelligence

Modern business functions are based on the promise of smooth and seamless experience, without the need for downtime or long waits for backend processes to finish. For such digital operations, timely execution of business processes—like financial closings, order fulfilment, report generation—is non-negotiable.

Monitor Claude Code adoption in your organization with Datadog's AI Agents Console

AI coding assistants are quickly becoming a core part of software engineering workflows, helping developers write, refactor, and review code faster. But without effective monitoring, it can be difficult to know whether these tools are performing reliably and proving useful to engineers. As organizations scale their use of tools like Claude Code, key questions emerge.

Accelerate investigations with AI-powered log parsing

When debugging production issues, investigating security incidents, or analyzing network traffic, engineers and analysts need not only to find the right logs but to make sense of all the dense, unstructured data generated by different systems. Logs rarely ship neatly laid out in a way that facilitates filtering, faceting, or graphing for every possible scenario. As a result, teams often find themselves writing regular expressions or custom parsers on the fly, which can be error-prone and time-consuming.