Operations | Monitoring | ITSM | DevOps | Cloud

Ghosts of Servers Past: The Bare-Metal Comeback Story

Bare-metal. Just reading that word might trigger a physical reaction for some of us. Dusty closets, old server rooms, and loud rigs that never seemed to work quite right. Remember waiting days for IT to provision a server, only to realize your ticket got lost in the shuffle? Or the classic "well, it worked on my machine" excuse right before a production push? Ah, the good old days.

[Webinar] Conquering the Complexity of Self-Hosted Apps with Agentic AI SRE

Most enterprise SaaS products, like Komodor’s Autonomous AI SRE Platform, require installing a remote agent on the customer’s infrastructure, which varies significantly from one organization to another, in terms of architecture, configurations, permissions, processes, and more. This “unmanaged” model creates major blind spots, making daily operations, observability, debugging, and incident response challenging. When failures occur, limited visibility and bespoke systems make root-cause analysis slow, incomplete, or impossible.

The Ultimate Kubernetes Cost Monitoring And Management Guide

While Kubernetes enables teams to deliver more value faster, understanding and controlling Kubernetes costs remains challenging. You have disposable, replaceable compute resources constantly coming and going across a range of infrastructure types. Yet at the end of the month, you only get a billing line item for EKS costs and several EC2 instances.

Catch Every Moment in Kubernetes: Splunk's Observability Advantage

Discover why real-time, unsampled observability is critical for Kubernetes environments with Stephane Estevez from Splunk at KubeCon Europe 2026. Learn how Splunk’s unique approach helps you catch every important moment—even when containers vanish in milliseconds. Watch now for expert insights on cloud-native monitoring, observability, and Kubernetes best practices!

The story behind Konstruct: Lessons learned scaling GitOps

In 2024, Civo acquired Konstruct (formerly Kubefirst) to reinforce our commitment to simplifying cloud computing complexities. When this acquisition was made, it began a whole new chapter for the team behind Konstruct. Over the years, we assembled our team by working with a community of thousands of engineers in what can be a very complex cloud native environment. We were fortunate to join forces with Civo as they aligned with our cloud native and portable vision.

Kubernetes Node Vs. Pod Vs. Cluster: What's The Difference?

Kubernetes is increasingly the standard for deploying, running, and maintaining cloud-native applications running in containers. Kubernetes (K8s) automates most container management tasks, empowering engineers to manage high-performing, modern applications at scale. Meanwhile, surveys from VMware and Gartner reveal that insufficient Kubernetes expertise prevents many organizations from fully adopting containerization. Understanding how Kubernetes components work removes this barrier.

When AI Writes the Code, Who Keeps Production Running?

The production environment has become a minefield of code nobody really understands. Here’s what’s happening: Development teams are using Claude Code, Cursor, and GitHub Copilot to ship features at 10x their previous velocity. Product managers are ecstatic. Business stakeholders are thrilled. And somewhere in a war room at 2:17 AM, an SRE is staring at a stack trace for code that was AI-generated three weeks ago, trying to figure out why the payment service just fell over.

The path to self-healing: Re-architecting for massive scale on kubernetes

In the world of network assurance, even a few seconds of delay can result in significant business losses. In this session from Civo Navigate India, Dr. Shivananda R Poojara (Head of Cloud Business Unit, Airowire Networks) explains how his team dismantled a massive monolithic service stack and rebuilt it for a high-performance, cloud-native era in just 75 days.

AI SRE in Practice: Accelerating Engineer Onboarding with Contextual Expertise

Onboarding new engineers to complex Kubernetes environments is expensive. Junior engineers need to learn cluster architecture, understand organizational conventions, navigate internal documentation, and build relationships with senior team members who can answer questions. The process takes weeks or months, and during that time, senior engineers spend significant time mentoring instead of working on complex problems.

AWS vs Google Cloud vs Azure for Cloud-Native and Kubernetes

Cloud adoption is no longer about “moving to the cloud.” It’s about building cloud-native platforms that are scalable, observable, automated, and Kubernetes-driven. This guide provides a deep comparison of with a focus on Kubernetes, platform engineering, DevOps, and modern workloads, aligned with standards pioneered by the Cloud Native Computing Foundation.

Kubernetes Namespaces: What They Are, How They Work, And What They Don't Solve

Using Kubernetes to manage containerized applications has its fair share of challenges. One of those challenges is managing complexity. Using namespaces can help minimize that complexity. Yet, a common misconception is that using multiple namespaces in a single Kubernetes cluster can degrade performance. Another issue: Kubernetes namespaces can reduce visibility into costs. There’s more to it than that.

Is Your File Integrity Monitoring Outdated? Kubernetes Needs Runtime FIM

If your file integrity monitoring (FIM) still relies on scheduled scans… it was built for static servers — not Kubernetes. In cloud-native environments, traditional FIM creates detection delays, wasted CPU, excessive I/O, and alert noise. And if a malicious process modifies a file and exits before the next scan? You might miss it entirely. In this video, we break down: Modern runtime FIM works differently. Instead of scanning everything on a schedule, it.

Rancher Live: Konveyor's Cloud Native Modernisation Blueprint

Join Divya Mohan as she hosts Savitha Raghunathan, Konveyor maintainer & Red Hat Senior Software Engineer to learn more about the CNCF Sandbox project, Konveyor. Dive into some of the open source strategies for legacy app migration to Kubernetes using the 6 Rs: Rehost, Replatform, Refactor & learn blueprint tools for analysis, containerization & AI-powered refactoring.

How to write annotations in Kubernetes with JSON for Datadog Autodiscovery | Datadog Tips & Tricks

Pod annotations in Kubernetes with invalid JSON syntax can prevent Datadog Autodiscovery from detecting integrations, resulting in missing metrics and gaps in monitoring. Watch this video for a step-by-step process to write annotations: Note: This video focuses on Datadog Autodiscovery v2 syntax.

Don't Panic: A Low-Risk Strategy for Ingress NGINX Retirement

The Ingress NGINX project is winding down. For many organizations, this means planning a migration for critical infrastructure. While the HAProxy Kubernetes Ingress Controller is the natural successor for these workloads, a "rip and replace" strategy isn’t always viable. You might have complex configurations, customized annotations, or deployment freezes that make a sudden switch risky. There's a lower-risk path: Place HAProxy in front of your existing Ingress NGINX deployment.

Build and test your first Kubernetes operator with Go, Kubebuilder, and CircleCI

Kubernetes operators extend the Kubernetes API with custom logic, automating tasks like provisioning, configuration, and policy enforcement. Instead of managing these tasks manually or with ad hoc scripts, Operators codify your workflows into controllers that run natively inside the cluster. In this tutorial, you’ll build a simple operator using Go and Kubebuilder; a framework that scaffolds much of the boilerplate so you can focus on core logic.

NVIDIA Rubin (R100) vs. NVIDIA Blackwell (B200) GPU

Since 1999, when NVIDIA invented the GPU (graphics processing unit), the demand has “skyrocketed”. At CES 2026, CEO Jensen Huang announced their latest GPU, named after Vera Rubin. This follows on from the announcement of their Blackwell lineup only two years ago. Through this blog, we’ll explore what the industry knows about the Vera Rubin so far. Plus, we will take a look at some specs in comparison to the NVIDIA B200 from the Blackwell lineup.

Introducing Konstruct: GitOps-powered IDP in minutes

"I wish I knew about this a couple years ago..." Over my seven years as a cofounder, I've heard some version of that line more than any other. Usually, it comes at the end of a demo to someone who has spent a year getting to something not even close to what they're seeing on my screen. The story is always the same. An organization adopts Kubernetes and arrives at the inevitable conclusion that they need a platform.

AI SRE in Practice: Diagnosing AWS CNI IP Exhaustion Before Widespread Outage

IP address exhaustion in Kubernetes doesn’t announce itself with clear error messages. Pods fail to schedule, services degrade unpredictably, and the symptoms look like a dozen different problems before anyone realizes the cluster has run out of available IP addresses. By the time the root cause becomes clear, multiple services are affected and recovery requires coordination across infrastructure layers.

AI for nuclear safety: Predicting component remaining useful life

As industrial systems become more complex in 2026, the reliability of critical infrastructure depends on shifting from reactive to predictive strategies. In this session from Civo Navigate India, Muthukumar Ganesan, a scientist at the Indira Gandhi Centre for Atomic Research (IGCAR), explores the application of AI and machine learning in securing the future of nuclear energy.

#053 - The Road to Distributed AI and Kubernetes Infrastructure with Matt Butcher (Fermyon) & Ari...

They share their professional origins, highlighting how Kubernetes transitioned from a complex tool for experts to a foundational technology for global enterprises.. Part of the conversation focuses on the history of Helm, explaining its growth from a simple hackathon project into a standard package manager. Another part takes on the future of distributed computing, specifically how Akamai is integrating infrastructure as a service to support modern workloads.
Sponsored Post

Kubernetes Load Testing Made Easy with Speedscale

Everybody knows working with Kubernetes is really hard. It's highly complicated. You have to know how to work with YAMLs, there's lots of stuff to deal with. The classic developer experience with YAML. But what if you could get complete visibility into your Kubernetes workloads and run realistic load tests without touching a single YAML file or running kubectl commands? In this walkthrough, I'll show you how Speedscale makes Kubernetes observability and performance testing as simple as point-and-click.

Kubernetes Network Observability: Comparing Calico, Cilium, Retina, and Netobserv

Calico, Cilium, Retina, and Netobserv: Which Observability Tool is Right for Your Kubernetes Cluster? Network observability is a tale as old as the OSI model itself and anyone who has managed a network or even a Kubernetes cluster knows the feeling: a service suddenly can’t reach its dependency, a pod is mysteriously offline, and the Slack alerts start rolling in. Investigating network connectivity issues in these complex, distributed environments can be incredibly time consuming.

Top Kubernetes interview questions of 2026: A beginners guide

Having been around for a decade, the world's most popular container orchestrator has set a standard for how we run containers at scale. According to the CNCF, cloud-native adoption has reached 98% across organizations, showing that Kubernetes adoption is not slowing down. Whether you are looking to land your first kubernetes role or you are experienced and are looking to brush up on your knowledge, we’ve put together the top questions to learn more about Kubernetes.

What "Open Source" actually means in 2026

What does "Open Source" really mean in the age of AI? In the conclusion of her session at Civo Navigate India, OpenUK CEO Amanda Brock shares a fundamental truth for the global tech community. True openness is not about being local; it's about global collaboration and ensuring that technology is accessible for any purpose, without friction. As we build the next decade of innovation, the goal is to build better, together, across the planet.

When ConfigMaps Hit Limits: Migrating to CRDs

Over the past few years, Kubex has evolved from a cloud optimization product into a Kubernetes-centric solution, shifting its focus from cost and waste visibility to fully automated resource optimization. As that evolution happened, one of the earliest design decisions we had made began to show its limits: how the product was configured.

What is sovereignty washing? When cloud control is more marketing than reality

In 2025, European Commission President Ursula von der Leyen announced plans for an EU Cloud and AI Development Act, prioritizing digital sovereignty amidst growing concerns over data security and privacy. These concerns have been fueled by Edward Snowden's 2013 revelations about US surveillance and further intensified by the Trump administration's actions and rhetoric, including its criticism of EU digital regulations and threats to US tech companies.

Kubernetes Vs. OpenStack: How They Differ, How They Work Together, And When To Use Each

Kubernetes and OpenStack are not competitors. They operate at different layers of the stack and are often used together. OpenStack manages cloud infrastructure such as compute, storage, and networking. Kubernetes runs on top of that infrastructure to deploy, scale, and manage containerized applications. Teams often compare them as alternatives, but in practice, Kubernetes frequently runs on OpenStack.

The rise of the agentic future: scaling AI workflows with relaxAI and n8n

This blog is based on the webinar, “From idea to agent: Building AI workflows with relaxAI and n8n”. You can watch the full recording by clicking here! AI isn’t slowing down. We’re moving from “ask a chatbot” to agents that run the multi-step workflows, use tools, and are built for real business processes. Most teams aren’t blocked by ideas. They’re blocked by three things: complexity, cost, and control.

Why leaders are reassessing the role of big tech in 2026

This session highlights a major strategic shift where sovereignty has moved from a technical detail to a top boardroom priority. The data reveals that 84% of leaders are concerned about geopolitical threats to their data access. 82% of respondents are ready to reassess their big tech partnerships specifically to regain data control. This shift is further evidenced by the 71% of decision-makers who now place sovereignty at the heart of their tech partner choices moving forward. The era of "sovereign-by-design" infrastructure is here. Are you ready to build for a more resilient future?

Beyond boundaries: How global collaboration defines AI in 2026

As we move through 2026, the global conversation around AI is shifting from simple adoption to a deeper focus on true openness and sovereignty. In this session from Civo Navigate India 2025, OpenUK CEO Amanda Brock explores the evolving state of AI openness and shares a significant milestone: India is now the world’s number one open-source contributing community.

AI SRE in Practice: Tracing Policy Changes to Widespread Pod Failures

Policy changes in Kubernetes are supposed to improve security, enforce standards, or optimize resource usage. But when a policy change triggers cascading pod failures across multiple namespaces, the investigation becomes a race to identify what changed before more workloads are affected.

Kubex and Tangoe Partner to Deliver Unified Cloud, Kubernetes, and FinOps Optimization

Enterprises operating at cloud scale today face a growing reality: managing infrastructure performance and cost in silos no longer works. Kubernetes, multi cloud environments, and GPU accelerated workloads deliver immense agility and capability, but they also introduce complexity that outpaces traditional monitoring and cost governance approaches.

The AI-Empowered Site Reliability Engineer: Automating the Balance of Risk and Velocity

You might expect an AI-SRE agent to target 100% reliable services, ones that never fail. It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a non-linear cost: maximizing stability limits how fast new features can be developed, dramatically increases the operational cost, and reduces the features a team can afford to offer.

What is the Open Container Initiative?

In this video, we explain the Open Container Initiative (OCI) and how open, vendor-neutral standards make containers portable and interoperable across platforms, tools, and environments. We cover what OCI is, why OCI compliance matters, and how OCI defines the core building blocks of the container ecosystem: container images, runtimes, and distribution.

From Blueprint to Production: Building a Kubernetes MCP Server

As Large Language Models (LLMs) evolve from simple chatbots into agentic workflows, the need for a standardized way to connect them to external data and infrastructure has become critical. In a recent workshop hosted by Nir Adler, Innovation Engineer at Komodor, we explored how to bridge this gap using the Model Context Protocol (MCP).

We Built an MCP Server

When I joined Kubex last year, the company was already well aware of the growing power of Large Language Models. As a company focused on intelligent resource optimization for Kubernetes, GPUs, and cloud infrastructure, generative AI didn’t feel like a threat so much as a natural extension of where the industry was heading. Kubex had already invested heavily in machine learning, but it was becoming clear that foundation models could unlock an entirely new class of capabilities for our customers.

Migrating from Ingress NGINX to Calico Ingress Gateway: A Step-by-Step Guide

In our previous post, we addressed the most common questions platform teams are asking as they prepare for the retirement of the NGINX Ingress Controller. With the March 2026 deadline fast approaching, this guide provides a hands-on, step-by-step walkthrough for migrating to the Kubernetes Gateway API using Calico Ingress Gateway. You will learn how to translate NGINX annotations into HTTPRoute rules, run both models side by side, and safely cut over live traffic.

#052 - The "Short Long Path": Mastering Abstraction, Culture, and Kubernetes Scale with Shemer M...

In this episode, Itiel joins forces with Shemer, Director of Platform Solutions at the gaming giant Playtika, and Scott Rosenberg, Lead Architect at TeraSky, to discuss the realities of platform engineering at a massive scale. The trio dissects Playtika’s multi-year journey from a legacy, homegrown Kubespray infrastructure to a modern, holistic platform built on Spectro Cloud, all while running strictly on-premise to support 25+ games and high-volume traffic.

Building Trust in the Machine: A Guide to Architecting Agentic AI for SRE

The promise of Artificial Intelligence in Site Reliability Engineering (SRE) is seductive: an autonomous system that never sleeps, instantly detects anomalies, and fixes broken infrastructure while humans focus on high-value work. However, the gap between a demo-ready chatbot and a production-grade Autonomous AI SRE is vast. In complex, noisy environments like Kubernetes, a “naive” implementation of Large Language Models (LLMs) is not just ineffective, it can be dangerous.

The economics of a sovereign cloud

The BCG recently released a report on the cost of cloud. The findings? Hyperscalers are charging up to 30% more for their sovereign-cloud offerings. It supports an earlier notion that if you want control, compliance, and jurisdictional certainty, you have to pay a premium. At Civo, we think that is broken. As data volumes grow and AI workloads become central to business strategy, the economics of cloud computing are being re-examined.

How Civo is building the "cloud the way you want it"

As we move through 2026, the global cloud landscape is being reshaped by the drive for digital independence first discussed at Civo Navigate India 2025. This keynote featuring Mark Boost, Dinesh Majrekar, Josh Mesout, and Ben Norris laid the groundwork for a future where organizations no longer have to choose between the scale of the public cloud and the security of a private environment.

The hidden cost of "just using Kubernetes"

Kubernetes has become the default foundation for a lot of modern application infrastructure. It’s powerful, flexible, and widely supported, which makes it an obvious starting point for many teams building a cloud-native application platform (a standardized way for teams to deploy, run, secure, and operate applications in production). But there’s a distinction that often gets lost early in the decision process: Kubernetes is a framework. It is not a platform.

Calico Ingress Gateway: Key FAQs Before Migrating from NGINX Ingress Controller

We recently sat down with representatives from 42 companies to discuss a pivotal moment in Kubernetes networking: the NGINX Ingress retirement. With the March 2026 retirement of the NGINX Ingress Controller fast approaching, platform teams are now facing a hard deadline to modernize their ingress strategy.

Komodor AI SRE vs. OSS AI Agent: A Technical Comparison of Agentic AI for Kubernetes Troubleshooting

Gartner predicts that AI agents will be implemented in 60% of all IT operations tools by 2028, up from fewer than 5% at the end of 2024. This acceleration has sparked an explosion of AI SRE solutions, from enterprise platforms to open-source alternatives, all promising faster root cause analysis and reduced MTTR.