Operations | Monitoring | ITSM | DevOps | Cloud

Kubernetes v1.35: The Release That Tackles the Industry's $100 Billion Waste Problem

Kubernetes v1.35 dropped a couple of weeks ago, and while the headlines focus on gang scheduling and in-place resizing going GA, there’s a bigger story here that every platform team needs to understand: Kubernetes is finally acknowledging that cluster utilization is fundamentally broken. At Komodor, we work with hundreds of organizations running Kubernetes at scale.

7 Kubernetes Predictions for 2026 - AI Will Push SRE to its Limit

As AI workloads shift from training to massive-scale inference, SRE teams are about to feel even more pressure. GPU-heavy computing is breaking the assumptions today’s clusters were built on, while enterprises are beginning to trust autonomous operations and cost pressure is pushing consolidation across the cloud-infrastructure stack.

[Webinar] Accelerating Kubernetes Intelligence: Cisco's Platform Evolution

Join Hasith Kalpage, Director of Platform Engineering , and Arthur Drozdov, Agentic AI Engineer, as they share how Cisco is using Komodor’s Klaudia Agentic AI to evolve its platform strategy, to unlock smoother developer experience, slash MTTR, and reduce bottlenecks across the enterprise. – Including a live demo of the CAIPE platform!

KubeCon Atlanta 2025 & the AI-Native Shift

KubeCon + CloudNativeCon North America 2025 in Atlanta marked a definitive moment for cloud-native infrastructure. Over four days, celebrating the 10th anniversary of both CNCF and Kubernetes, more than 9,000 attendees witnessed the ecosystem’s evolution from container orchestration to AI-native operations. The conference delivered a clear message – AI workloads are no longer experimental.

Building Trust in AI-Powered Kubernetes Ops: Why "Good Enough" Is a Production Killer

The air in the operations world is thick with AI and LLMs. EVERY vendor is rushing to slap an “AI-powered” badge on their product. But here’s the uncomfortable truth: In high-stakes Kubernetes operations, one bad AI recommendation can destroy months of trust-building in an instant. We aren’t building a chatbot to suggest recipes. We are building systems that, armed with kubectl permissions, have the potential to take down production with a single, wrong command.

The War Room of AI Agents: Why the Future of AI SRE is Multi-Agent Orchestration

We’ve all been there. It’s 2 AM, your phone is buzzing with alerts, and you’re suddenly thrust into an incident war room with a dozen other bleary-eyed engineers. The production environment is on fire, customers are affected, and everyone’s trying to piece together what went wrong. But here’s what makes these moments fascinating from a systems perspective – it’s rarely just one person silently fixing the issue in isolation.

Cost Optimization Is Now Part of the SRE Playbook

In the era of cloud-native architectures, Site Reliability Engineering (SRE) has matured from a discipline focused purely on uptime to a sophisticated practice of efficient reliability. The key driver for this evolution is an undeniable truth: cloud spend has become intrinsically linked to system stability.

Welcome to the Next Frontier: AI on Kubernetes

Last week’s KubeCon Atlanta made one thing abundantly clear, Kubernetes is quickly becoming the de facto platform for AI workloads – with the event lineup chock full of talks, workshops, and even co-located events dedicated to AI, machine learning and running data on Kubernetes natively – with approximately 50 (!) sessions in total focused on AI, ML, LLM, and GenAI topics.. What was until now mostly PoCs and aspirational is now truly delivering in production.

Lessons from KubeCon: What "Best-of-Breed" AI SRE Really Requires

This year’s KubeCon underscored a real shift: AI SRE has gone mainstream. Of course, it’s not a surprise. Teams from high-growth startups to Fortune 500s are running more complex, cloud-native systems, shipping more AI-generated code, and facing rising expectations. Downtime is absolutely not an option and the work for on-call SREs has become unsustainable. The question isn’t whether AI SRE helps. It’s which one you can trust in production.