Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Containers, Kubernetes, Docker and related technologies.

Cloud freedom with AI built in

Most cloud providers give you the hardware and leave you to figure out the rest. Civo AI is different. Chief Innovation Officer Josh Mesout explains how Civo thinks strategically about AI adoption, guiding organisations through the full lifecycle from planning and infrastructure through to running and scaling workloads, powered by best-in-class NVIDIA GPUs.

What is the sovereignty tax, and is your organization paying it?

Most organizations know cloud costs are rising. Fewer realize that some of what they're paying isn't for infrastructure at all; it's a penalty for not being in control of it. That penalty has a name: Sovereignty Tax. It isn't a line item on your invoice. It won't appear in your cloud dashboard. But it's accumulating quietly, in egress fees, outage exposure, audit blind spots, and the creeping realization that leaving your current provider would be harder, and more expensive, than you ever anticipated.

Building vs. Buying your platform: The honest framework nobody discusses

Most organizations get the build versus buy decision wrong in the same way. They underestimate the cost of building while overestimating the cost of buying. In the recent Konstruct monthly webinar with M R Rishi (Platform Engineer at Civo), we explored the discussion surrounding whether you should build or buy your platform. If you want to watch the full discussion, watch the recording here.

How AI is changing platform engineering

AI is changing software development fast. But what does that actually mean for platform engineering teams? In this conversation, Civo's John Dietz and M R Rishi dig into what they're seeing on the ground, the 10x effect of AI on app count, what it means for platform team workloads, the debugging skills that are quietly being lost, and whether Kubernetes itself might eventually become just another abstraction.

How Kubernetes Operators May Conflict With Resource Optimization (And How to Avoid It)

A Kubernetes Operator is a method of packaging, deploying, and managing a Kubernetes application. It extends the native Kubernetes API by combining custom resources (CRDs) with a dedicated controller: a custom control loop that continuously watches the state of those resources. The primary purpose of an operator is to automate complex, stateful applications (like databases, message queues, or monitoring suites) that require human operational knowledge to maintain.

How we saved over $3 million in idle compute costs with Datadog Kubernetes Autoscaling

At Datadog, our broad Kubernetes footprint amplifies the significance of a familiar autoscaling tradeoff: Overprovisioning wastes cloud spend, while underprovisioning threatens reliability. We built Datadog Kubernetes Autoscaling (DKA) to help teams rightsize their workloads by generating intelligent resource recommendations and automating multidimensional workload scaling. Across Datadog, adopting DKA has eliminated more than $3 million in annualized idle compute costs while reducing reliability risks.

The debugging crisis nobody's talking about: AI, abstraction, and the skills gap

Here's a scenario that's playing out in engineering teams across the industry right now. A developer uses AI to rapidly prototype a microservice. The code works. They deploy it to production. Six months later, something breaks. The system is under load, a database connection pools, and the service starts failing in subtle ways. The engineer pulls up the code, but here's the problem, they didn't write it. An AI assistant did. They don't understand the flow deeply. They don't know where to look first.

New in Kubex: KAI Scheduler Integration for Shared GPU Inference

Today, we’re launching Kubex support for the KAI Scheduler and automated GPU sharing for inference workloads. As AI inference moves into production, platform teams are being asked to serve more models, support more teams, and control GPU costs at the same time. But many inference workloads do not need an entire GPU all the time. When teams reserve full GPUs or oversized GPU fractions to stay safe, expensive capacity can sit idle across the cluster.