Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Containers, Kubernetes, Docker and related technologies.

Monitor LLM routing with the Kubernetes Inference Extension

If you serve LLMs on Kubernetes without inference-aware routing, your load balancer is likely wasting inference capacity. Generic HTTP traffic management blindly routes requests, assuming the backends in your cluster are interchangeable. But your model-serving backends are stateful and unevenly prepared to handle any given request. As a result, requests are often routed to the backend that’s not the one best suited to respond.

Konstruct product updates: Global resources, MCP support, and smarter permissions

May has been one of our busiest months yet for Konstruct. Across three releases, 0.5, 0.5.1, and 0.5.2, we've shipped some of the most requested platform-level changes since we launched: a unified model for sharing resources across organizations, native support for AI-driven workflows via MCP, a completely redesigned API keys experience, and a cleanup to how permissions actually work in multi-org environments. Let's walk through what shipped and why it matters.

Deploy Datadog Kubernetes Autoscaling at scale

Every Kubernetes environment accumulates waste over time. Teams overprovision CPU and memory requests to avoid performance risk, run idle replicas to preserve headroom, and leave Horizontal Pod Autoscalers (HPAs) untouched long after workload behavior has changed. Some of this waste can be addressed at the node level, where Datadog Cluster Autoscaling helps teams rightsize capacity.

Secure execution: Agents in sandboxes with relaxAI

The hard part of deploying AI agents isn't the agent. It's the environment around it. As organisations move from AI experimentation into production, the question isn't just what agents can do; it's whether you can trust the environment they run in. Sandboxed execution gives you both the autonomy and the guardrails, keeping agents isolated, auditable, and under your control.

Digital sovereignty: Who's in control?

Digital sovereignty isn't a marketing buzzword. It's about jurisdiction, accountability, and operational certainty and it starts with where your data is hosted and how it's processed. Civo's UK sovereign cloud delivers public cloud, private cloud, and AI services, all hosted and operated exclusively within the United Kingdom under UK legal authority with no exposure to foreign control.

The Kubeshark Workflow That Doesn't Stop at the Dashboard

The Observability Gap shows up the moment you try to reproduce a production bug locally. Your traces tell you a request was slow. Your logs tell you which line printed. Neither tells you what was actually on the wire: the headers, the JSON body, the surprise field your client started sending last Tuesday. Until now, closing that gap meant SSHing to a node, attaching a debugger, or shipping a sidecar through change review.

The inside scoop on alerting changes in Kubernetes Monitoring

Kubernetes Monitoring in Grafana Cloud comes out of the box with preconfigured alert rules that notify you about issues like CPU throttling, crash-looping pods, and nodes going offline. These rules are installed automatically when you set up the app, and they start evaluating immediately. But if you've recently reinstalled the Kubernetes Monitoring app and your alert notifications stopped arriving, or started looking different, you're not alone.

Hybrid Cloud Monitoring Explained: On-Prem + Cloud + Kubernetes in One View

Understand what hybrid cloud monitoring is and why it’s critical for managing modern distributed IT environments. Hybrid cloud monitoring helps organizations unify visibility across on-prem infrastructure, public cloud platforms, virtual machines, containers, and Kubernetes clusters in a single monitoring platform. In this video, learn how fragmented monitoring tools create operational blind spots and slow down incident response across hybrid environments.

The AI Agent Accountability Gap: Why Network Policies, API Gateways, And RBAC Are Not Enough

In The Five Pillars of AI Agent Accountability: A Diagnostic Framework for Engineering Leaders, we walked through each pillar of AI agent accountability (traceability, authorization provenance, identity and ownership, policy at scale, and human oversight) and argued that most enterprises today sit at Level 0 or Level 1 of the Accountability Maturity Model. The most common reaction we get when we share that framework is some version of: “We’re already covered. We have network policies.

The Case for VM and Container Consolidation in 2026

Two platforms, two teams, two procurement relationships, all doing one job. There’s a reason it ended up this way. There isn’t a reason it has to stay this way. Ask anyone at a typical enterprise why the VM platform and the container platform are separate, and they’ll give you a sensible answer. The VM estate has been there for fifteen years. It runs the workloads the business depends on.