Operations | Monitoring | ITSM | DevOps | Cloud

Improve service reliability and ops culture with Grafana Cloud Service Center

Today’s engineering organizations are built around service ownership. Service owners are accountable for keeping their services reliable, performant, and ready to scale. But no service operates in isolation; every team depends on others, and those dependencies form a complex web that can be hard to see, let alone understand. To truly deliver reliable systems, you need visibility not only into how your own service performs, but also how it affects others.

KubeCon NA 2025: Universal Mesh, federation, and the end of the "mesh tax"

At KubeCon, we asked a simple question at our booth: "How much is your service mesh costing you?" The answers were eye-opening. Engineers shared stories of 40% resource overhead, multi-second latency spikes during peak traffic, and infrastructure bills that had nearly doubled since mesh adoption. One architect told us they were spending more time managing their mesh than building features.

AI Infrastructure Is Creating a New Wave of Incidents, And Why Enterprises Need a Modern On-Call Strategy

Over the last few years, AI has quietly shifted from a fascinating experiment to a core operational system. Enterprises aren’t just building prototypes anymore — they’re deploying LLMs into production environments where uptime directly affects customer interactions, revenue flows, and business continuity. AI has essentially become a new layer of critical infrastructure. Because of that shift, the definition of “reliability” is changing.

All Is Calm, All Is Compliant: Staying Audit-Ready Through the Year-End Rush

As the year winds down, I find that most cybersecurity and compliance teams are focused on closing projects, hitting targets, and maybe even planning a well-earned break. But regulators? They don’t take holidays. FCA, PRA, GDPR – they remain vigilant, and so should you. For IT leaders, this season often feels like walking a tightrope: balancing operational demands with the relentless need for compliance.

From FinOps for AI to AI-Native FinOps

One year ago, at AWS re:Invent, we launched CloudZero Advisor, a free, standalone AI assistant that enables anyone to ask questions about cloud spend in plain language. It was the first experiment of its kind in FinOps, a chance to see what people really wanted to know when cost data finally became conversational. Over the past year, Advisor has become a learning engine.

Information as a Strategic Weapon: Building the Architecture of Advantage

Information dominance has become key to battlefield success. The evolution from Network-Centric Warfare to Multi-Domain Operations (MDO) and JADC2 is all about connecting drones, sensors, weapon-systems and decision-makers, across land, air, sea, cyber, and space… in real time. Read about the journey, principles and building blocks, and how Ribbon Communications’ solutions are in the middle of it.

Stop tool sprawl - Welcome to Terraform/OpenTofu support

Provisioning cloud resources shouldn’t require a second stack of tools. With Qovery’s new Terraform and OpenTofu support, you can now define and deploy your infrastructure right alongside your applications. Declaratively, securely, and in one place. No external runners. No glue code. No tool sprawl.

How To Migrate Away From DogStatsD Using Telegraf

Datadog is a popular monitoring platform, and one of its key components is DogStatsD which is a customized extension of the original open-source StatsD protocol. DogStatsD adds powerful features like tagging, histograms, and distributions, but it also introduces vendor lock-in. This is because DogStatsD metrics follow a specific wire format that many other monitoring platforms do not natively support.