Operations | Monitoring | ITSM | DevOps | Cloud

How Kubernetes Operators May Conflict With Resource Optimization (And How to Avoid It)

A Kubernetes Operator is a method of packaging, deploying, and managing a Kubernetes application. It extends the native Kubernetes API by combining custom resources (CRDs) with a dedicated controller: a custom control loop that continuously watches the state of those resources. The primary purpose of an operator is to automate complex, stateful applications (like databases, message queues, or monitoring suites) that require human operational knowledge to maintain.

New in Kubex: KAI Scheduler Integration for Shared GPU Inference

Today, we’re launching Kubex support for the KAI Scheduler and automated GPU sharing for inference workloads. As AI inference moves into production, platform teams are being asked to serve more models, support more teams, and control GPU costs at the same time. But many inference workloads do not need an entire GPU all the time. When teams reserve full GPUs or oversized GPU fractions to stay safe, expensive capacity can sit idle across the cluster.

The Inference Paradox: How Split-Brain LLMs Are Killing Your GPU ROI

During the Toronto KCD (Kubernetes Community Days), I attended an insightful talk on AI resource optimization that highlighted a staggering Gartner study: “AI infrastructure is adding $401 billion in new spending this year alone. Yet, real-world audits tell a much darker story, revealing that average GPU utilization in the enterprise is stuck at a dismal 5%”. While many people in the audience were shocked by that number, the data didn’t come as a surprise to us.

From Visibility to Real Savings: Turning FinOps Insights into Measurable Cost Reduction

FinOps programs are maturing, and most organizations have better visibility into cloud spend than ever before. Dashboards are full of data. And yet costs keep climbing. The problem isn’t the data. It’s the gap between knowing where the waste is and actually eliminating it. In this joint session, Tangoe and Kubex come together to bridge that gap. Tangoe brings deep expertise in spend management and FinOps discipline, while Kubex delivers infrastructure-level optimization across cloud, Kubernetes, and the AI and GPU workloads that are rapidly becoming the next frontier of cost pressure.

10 Enterprise AI Infrastructure Voices Worth Following

Enterprise AI has crossed an inflection point. The model problem is largely covered. What remains unsolved is the operational impact: how to run AI inference and agentic processes continuously, reliably, and at a cost that doesn’t cancel out the value. Most enterprises are discovering this the hard way. GPU utilization dashboards show 80%. Actual compute efficiency is half that. Token demand is compounding at 200-500% annually as agents multiply every action into dozens of model calls.

Kubernetes Optimization Beyond Requests and Limits - Node Scaling Blockers

Many of us understand the concept of Kubernetes Requests and Limits, and that by reducing over-sized resource requests we can reduce waste in our clusters. And for GKE Autopilot and EKS Fargate clusters that is true. Because you’re being billed directly for the resources you’re requesting, driving down requests can result in instantaneous savings. However in most hosted Kubernetes environments you’re not actually being billed for requests.

Autonomous K8s Optimization Involves Both Compute and Storage Resources - Are You Doing Both?

One of the most powerful capabilities in K8s is the ability to autoscale resources to meet demands, scaling resources up during peak periods to ensure performance, and down again during lower periods to save money. In this joint session, Lucidity and Kubex walk through what end-to-end K8s optimization looks like when you address both layers together. We cover: Expect real examples, not slides full of theory. You’ll leave with a clear picture of where waste is hiding in your environment and a prioritized approach to addressing it.

15: Optimizing AI Workloads: Balancing Cost, Performance, and Scalability with Bijit Ghosh

In this episode, Andrew Hillier and Bijit Ghosh discuss the evolving landscape of AI, discussing the growing prominence of inference over training, hybrid cloud strategies, balancing cost with performance, and the orchestration of complex hardware environments. The conversation also touches on emerging concepts like AI factories, the challenges of sovereign cloud, and how enterprises are navigating data gravity and regulatory constraints. It's a deep dive into optimizing AI infrastructure, managing costs, and the disruptive changes that are transforming both technology and business outcomes.