Operations | Monitoring | ITSM | DevOps | Cloud

How Cisco Revolutionized Platform Engineering with Komodor's Agentic AI

In the world of cloud-native infrastructure, complexity is the silent killer of innovation. For Cisco Outshift, the company’s incubation engine, managing a sprawling environment of AWS EKS clusters and edge-based MicroK8s workloads created a classic bottleneck: the Platform Engineering team was drowning in toil. Facing SRE burnout and the limits of human scaling, Cisco embarked on an ambitious journey to evolve its internal operations from standard DevOps to Agentic AI.

How Qovery uses Qovery to speed up its AI project

Discover how Qovery leverages its own platform to accelerate AI development. Learn how an AI specialist deployed a complex stack; including LLMs, QDrant, and KEDA - in just one day without needing deep DevOps or Kubernetes expertise. See how the "dogfooding" approach fuels innovation for our DevOps Copilot.

#051 - Surviving the Shift: From Legacy Monoliths to Day 2 Chaos with Hayato Shimizu (Digitalis)

From the early days of "neural nets" and WebSphere to the modern complexities of Kubernetes, Hayato Shimizu has seen the evolution of infrastructure firsthand. In this episode of Kubernetes for Humans, the co-founder of Digitalis joins the show to discuss the harsh realities of enterprise platform engineering and his personal journey from corporate employee to consultancy owner.

Kubernetes Logging Best Practices

You’re sitting at your desk, typing away, when all of a sudden you hear a “ping!” Unfortunately, you have a browser with fifteen tabs open, a task management application, email, messaging applications, and calendars all open, making it difficult to know exactly which technology just pinged you. To identify the source, you open your system settings and look at the notifications section to see which ones you allow to make a sound.

How Modern Network Analytics Drive Faster, More Reliable Applications

Your users face sluggish performance and spotty connections daily. Hybrid cloud paths, SaaS platforms, SD-WAN routes, and Wi-Fi networks all contribute to this frustration. Microsoft recently revealed they handled a 2.4 Tbps DDoS attack on Azure, proving how enormous network events quietly erode application quality without causing total blackouts.

Kubernetes Cost Traps: Fixing What Your Scheduler Won't | Harness Blog

Kubernetes cost overruns usually come from small, invisible scheduling decisions—not the platform itself. Over-provisioned requests, poor bin packing, and fragmented node pools quietly waste cloud spend. Cost-aware scheduling, right-sizing, and smarter node selection can deliver major savings without hurting performance. Treat cost as a first-class metric with visibility into why scaling decisions happen—not just when.

Webinar (Jan 15 2026): Take Back Control of Your Infrastructure (feat. nvisia)

Learn how leading teams are reducing complexity, controlling costs, and building resilient environments with modern private cloud patterns.. What we covered: If you’re evaluating private cloud, hybrid infrastructure, or looking to take back control of your infrastructure in 2026, this session provides a clear, actionable starting point. Reach out to our team to learn more today!

The 4 pillars of AI in 2026: Agents, cost, observability & sovereignty

AI is no longer just about "one-shot" prompts. In this session from our "From Idea to Agent" webinar, Ben Norris (AI Engineer at Civo) breaks down the four key priorities dominating the enterprise space in 2026. From the 130x explosion in token usage to the "vibe-coding" revolution, learn why businesses are turning away from US hyperscalers in favor of democratized, secure, and UK-sovereign AI infrastructure. We explore how autonomous agents are solving multi-step problems and why "Chain of Thought" reasoning is unlocking AI for heavily regulated industries like finance and healthcare.

AI SRE in Practice: Resolving Node Termination Events at Scale

When a node terminates unexpectedly in a Kubernetes cluster, the immediate symptoms are obvious. Workloads restart elsewhere, services experience partial outages, and alerts fire across multiple systems. The harder question is why it happened and how to prevent it from recurring. This scenario walks through a node termination event where the entire node pool was affected, requiring investigation across infrastructure layers to identify root cause and implement lasting remediation.

Cloud sovereignty vs. Cloud innovation: Why India doesn't have to choose

As we witness the rise of AI, the need for sovereignty is no longer optional. For organizations deploying larger models with access to sensitive data, it is a requirement. Research has shown concerns around sovereignty ‘hindering innovation’ and having ‘knock-on consequences for innovation’. We don’t see it that way. Sovereignty isn’t a trade-off for innovation; in fact, for India to scale securely, the two must work in tandem.

From idea to agent: Building AI workflows with relaxAI and n8n

Join us for this live online webinar as we explore how to design, build, and deploy practical AI agents using n8n’s workflow automation platform powered by relaxAI’s UK sovereign infrastructure. Our speaker, Ben Norris, AI Engineer at Civo, will guide you through the real-world process of creating intelligent agents that automate tasks across tools and services, all without deep coding expertise.

[Webinar] Building Quality-Driven Agentic AI in Noisy Big Data Environments

Watch as Itiel Shwartz, Komodor CTO and Co-Founder as he shares hard-won lessons from developing an AI agent that processes millions of K8s events daily to deliver autonomous troubleshooting that reached 95%+ accuracy in benchmarking. This webinar covers: Building production ready systems that maintain reliability when 90% of your data is noise. How Komodor developed an AI SRE agent that processes millions of K8s events daily to deliver autonomous troubleshooting that reached 95%+ accuracy in benchmarking.

An introduction to GPU time-slicing

GPUs are no longer a niche component. Gamers know them for immersive graphics, workstation users rely on them for balanced performance, and in the age of AI, GPUs have become one of the most in-demand resources in modern infrastructure. They are also expensive. That reality creates two immediate constraints, for individuals and enterprises alike: GPU-backed instances should be provisioned deliberately, and once provisioned, they should be used efficiently.

2026 insights into the Indian cloud market

India is no longer just a fast-growing cloud market; it is becoming a strategically vital one. What was once a race for cost efficiency and global hyperscaler expansion has evolved. Today, India’s cloud landscape is being reshaped by a new reality: the need for AI infrastructure, true data sovereignty, and the ambition to own its digital future. Following the discussion at Civo Navigate India 2025, one thing is clear: the status quo is shifting.

How is the next wave of AI impacting the Indian cloud scene?

Gartner has predicted that 2026 will see a 10.6% increase in India’s total IT spend from 2025 (2025: USD 159 billion vs 2026: USD 176.3 billion), with data centres, cloud infrastructure, and AI-enabled technologies driving this growth. This isn’t just a budget increase; it’s a fundamental shift in where innovation happens, who owns the infrastructure, and how we translate AI potential into scalable impact.

Ingress NGINX Project Is Retiring: A Step-by-Step Guide to Replacing the Ingress NGINX Controller

The Ingress NGINX Controller is approaching retirement, and teams need a clear path forward to manage Kubernetes ingress traffic securely and reliably. To make this transition easier, we’ve created a single, curated hub with all the relevant blogs and webinars. This hub serves as your one-stop resource for understanding the migration to Kubernetes Gateway API with Calico Ingress Gateway.

Introducing The First Graylog Helm Chart Beta V1.0.0

Running Graylog on Kubernetes has been possible for a while, but let’s be honest: it usually involved a fair amount of DIY. Custom manifests, duct-taped values files, and more than one late-night kubectl describe pod. That changes today. We’re releasing the first-ever Graylog Helm chart for Kubernetes — now available in beta.

Why data sovereignty has become a strategic imperative for India

Data is the backbone of the modern economy, but if you don’t have control of the infrastructure, you don’t control the data. Historically, data sovereignty was a compliance checkbox, something for the legal team to handle. Today, it’s a strategic national priority. At Civo Navigate, we sat down with industry experts to unpack why India is now placing sovereignty at the centre of its digital strategies.

The next wave of AI: Open source, robotics & the future of India's tech powerhouse

As we kick off 2026, the tech landscape is being reshaped by the very breakthroughs discussed at Civo Navigate India 2025. This panel, featuring Josh Mesout, Murthy Chitlur, Chirotpal Das and Anjali Batra, laid the groundwork for the AI-driven world we are operating in today. From the rise of agentic AI and small language models to the massive shift toward open-source parity, these experts didn't just discuss trends; they provided the blueprint for building resilient, sovereign, and scalable AI infrastructure in India.

AI SRE in Practice: Diagnosing Configuration Drift in Deployment Failures

Deployments fail for dozens of reasons. Most of them are obvious from the error messages or pod events. But when a deployment rolls out successfully according to Kubernetes but your application starts experiencing latency spikes and error rate increases, the investigation becomes significantly harder. This scenario walks through a configuration drift incident where the deployment appeared healthy but available replicas were constantly flapping, creating cascading reliability issues.

India's path to digital independence: AI, Cloud, and Sovereignty

Digital sovereignty has moved from theory to necessity as organizations grapple with data control and independence. At Civo Navigate India 2025, Rahul Poruri, Toshal Khawale, Deepthi Anantharam, and Kunal Kushwaha examined how nations are balancing innovation with the need for multi-jurisdictional compliance.

Why container security only works when the platform owns it

Container security has finally gone mainstream. When Docker announced hardened container images in late 2025, complete with minimal attack surfaces, non-root defaults, continuous CVE scanning, and automated updates, the response was enthusiastic. For teams managing their own infrastructure, this was a real step forward. Secure-by-default containers are no longer niche or expensive. They are expected.

Kubernetes Networking at Scale: From Tool Sprawl to a Unified Solution

As Kubernetes platforms scale, one part of the system consistently resists standardization and predictability: networking. While compute and storage have largely matured into predictable, operationally stable subsystems, networking remains a primary source of complexity and operational risk This complexity is not the result of missing features or immature technology.

From IPVS to NFTables: A Migration Guide for Kubernetes v1.35

Kubernetes v1.35 marks an important turning point for cluster networking. The IPVS backend for kube-proxy has been officially deprecated, and future Kubernetes releases will remove it entirely. If your clusters still rely on IPVS, the clock is now very much ticking. Staying on IPVS is not just a matter of running older technology. As upstream support winds down, IPVS receives less testing, fewer fixes, and less attention overall.

AI SRE in Practice: Resolving GPU Hardware Failures in Seconds

When a pod fails during a TensorFlow training job, the investigation usually starts with the obvious questions. The answers rarely come quickly, especially when the failure involves GPU hardware that most engineers don’t troubleshoot regularly. This scenario walks through an actual GPU hardware failure and shows how AI-augmented investigation changes both the time to resolution and the expertise required to handle it.

Harness | Docker Artifact Registry | How to Push and Pull Images

This video provides a clear and practical walkthrough of the Harness Artifact Registry, demonstrating how to work with Docker images in a secure and reliable manner. You will see the complete flow of pushing images into the registry and pulling them back for builds, deployments, and platform workflows. The goal is to help developers and platform engineers understand how the registry fits into everyday delivery pipelines.

Is Kubernetes actually HARD? #speedscale #kubernetes #k8s #devops #cloudnative

Thinking about learning Kubernetes in 2026? You’ll need GitOps, kubectl, and CI/CD pipelines... OR you can just use Speedscale. See how a single operator replaces a million dependencies and gives you the traffic insights you actually need to survive production.

Kubernetes is Hard. Here is the "Easy Mode" for 2026

Is Kubernetes actually hard, or are we just using the wrong tools? In 2026, the Kubernetes ecosystem has become a "dependency jungle." Between GitOps, YAML configuration, kubectl mastery, and complex CI/CD pipelines, developers are spending more time managing infrastructure than writing code. In this video, Ken breaks down the "hard parts" of K8s and introduces a more efficient workflow using Speedscale. Learn how to gain instant visibility into your cluster, pull logs without the headache, and turn real-world traffic into actionable load tests.

Top 7 Kubernetes Add-ons

The open-source Kubernetes platform is designed to help simplify application deployment through Linux containers. It supports tasks like deploying workloads in the form of pods, clustering nodes, managing container runtimes, and tracking resources. The Kubernetes microservices system has risen in popularity over the last several years as an easy way to support, scale, and manage applications.

When is it ok or not ok to trust AI SRE with your production reliability?

There’s a moment every engineer knows. An AI suggests a fix, it looks reasonable,maybe even obvious, but production is on the line and you hesitate before clicking execute. There’s a big difference between an AI that can recommend an action and one you’re willing to let take that action. All it takes is one bad call, one kubectl command that makes things worse, and suddenly every automated suggestion is a potential liability instead of a help.

Inside Qovery's security architecture: how we secure your cloud & Kubernetes infrastructure

Discover how Qovery bridges the gap between developers and infrastructure with a "security by design" approach. From federated identities and unique encryption keys to real-time audit logs and SOC2 Type 2 certification - see how we protect your data while eliminating vendor lock-in.

Key Insights from the 2025 GigaOm Radar for Container Networking

In 2025, as modern applications became ever more distributed and the use of Kubernetes continued to proliferate, the role of container networking was critical. Today’s enterprises demand networking solutions that can scale, secure, and connect services reliably, whether those services run across multiple clouds, hybrid environments, or on-premises clusters.

How companies are using Civo GPUs to accelerate AI innovation without runaway costs

Accessing high-performance GPUs shouldn’t feel like a bottleneck. Yet, as AI adoption accelerates, many teams are discovering that hyperscaler offerings often come with a hidden price: long wait times, opaque billing, and layers of unnecessary complexity. At Civo, we’ve seen a different way. Our GPUs enable companies to move faster while keeping infrastructure overhead and costs firmly under control.

How to achieve cloud agility without compromising control or cost

As organizations increasingly embrace digital transformation, cloud agility has become a critical priority. Yet, the promise of cloud-native speed and flexibility often comes with trade-offs: loss of control, unpredictable costs, and operational complexity. Many companies find themselves stuck between the desire for agility and the reality of legacy infrastructure or regulatory constraints. At Civo, we don't think you have to choose. We’ve spent years helping teams navigate this tension.

How Kubernetes Node Affinity Works (And Why It Matters for K8s Cost Control)

Think about how airlines assign seats on a plane. Some have extra legroom. Some sit near exits. Some are cheaper, while others cost a premium. Certain passengers also have strict requirements, like families traveling together or travelers who paid for a specific class. Now imagine boarding everyone randomly. A passenger who paid for extra legroom (perhaps for health reasons) ends up squeezed into a middle seat. Families scatter across the cabin. Premium seats sit half empty while the back rows overflow.

From Promise to Practice: What Real AI SRE Can Actually Do When Production Breaks

We’ve written before about the advantages of training an AI SRE on real telemetry data rather than generic Kubernetes documentation. We’ve explained why RAG augmentation based on actual high-scale workload patterns produces better results than LLMs trained on generic scenarios or forum threads. The theory makes sense, the architecture is sound, and the approach is defensible.

Podman vs Docker 2026: Security, Performance & Which to Choose

When it comes to containerization technologies, Podman and Docker are the two giants that often come up in conversation. Both have revolutionized how we build, deploy, and manage containers, but what sets them apart? In this blog, we'll dive deep into a side-by-side comparison of Podman and Docker. We'll cover everything from architecture to security, performance, and compatibility.

EP #3: Cloud, Kubernetes, and the Evolution of DevOps - The Open Source Observability Podcast

Kris Buytaert is the Co-founder of Inuits, O11y, and ‘DevOps Days,’ an internationally-attended series of DevOps events. He is a passionate advocate of Free and Open Source Software, and is accredited by the community as being a founding instigator of the DevOps movement. In this episode we trace the history of the DevOps movement from its intersection with open source and Agile, through the evolution of Cloud technologies and tools such Docker and Kubernetes, to present day best practices for CI/CD, monitoring, and observability.