Operations | Monitoring | ITSM | DevOps | Cloud

#049 - The AI Translator: Using LLMs & MCP for K8s Operations & Self-Healing Infra with Alexei Le...

In this episode, Itiel Shwartz kicks off a series on MLOps, LLM, and GenAI in Kubernetes. Starting with Alexei Ledenev, who has over two decades in software development and deep experience in cloud architecture and distributed systems. He shares his journey from CoreOS Fleet to his current role on the Platform Team at Doit.

Introducing Puppet Edge: The Future of Network and Edge Device Management

Discover how Puppet Edge is revolutionizing infrastructure management by combining declarative and imperative approaches into one powerful solution. In this video, Margaret Lee, a product leader at Puppet, introduces Puppet Edge and explains how it simplifies the management of your entire estate, from network to edge devices.

Kubernetes v1.34: What You Need to Know

Kubernetes v1.34, codenamed “Of Wind & Will (O’ WaW)”, brings a wide range of enhancements aimed at making clusters more efficient, secure, and easier to manage. This release delivers 58 enhancements with 23 graduating to Stable, 22 entering Beta, and 13 in Alpha, reflecting the platform’s continued maturation as enterprises scale their container orchestration needs.

Kubernetes monitoring explained: Key metrics, labels, and best practices

Monitoring Kubernetes and containers doesn’t have to be overwhelming. In this video, we’ll break down the essential metrics you need to track, why labels are critical for container visibility, and the best practices for Kubernetes monitoring at scale. You’ll learn: How tools like Site24x7 simplify Kubernetes monitoring with auto-discovery, dashboards, anomaly detection, and forecasting. Whether you’re a DevOps engineer, SRE, or developer, this video gives you the practical knowledge to improve container monitoring and observability.

Creating and using a Network Discovery Profile in Site24x7

Learn how to create and use a Discovery Profile in Site24x7 to simplify and automate network device onboarding. In this video, we walk you through setting up discovery parameters, applying filters and thresholds, grouping and tagging devices, configuring alerts, integrating with ITSM and collaboration tools, and scheduling periodic rediscovery. Whether you're managing a single site or multiple customer environments, Discovery Profiles help you.

How to Responsibly and Effectively Contribute to Open Source Using AI

With the influx of AI tooling, it’s never been easier to contribute to open source communities. These tools are capable of gathering context quickly, “understanding” repositories faster than ever before. They provide instant summaries about repositories that, previously, would have meant reading lines and lines of code. They can fix bugs in programming languages you don’t know, and ultimately allow more contributors to get involved, which (almost) every open source project wants.

Kubernetes Observability: Your Q&A Guide to Calico Whisker

Getting the most out of Whisker requires understanding its inner workings and this guide is designed to help you master this exciting tool with support from the Calico community. We’ve compiled the most frequently asked questions from our community Slack, support conversations, and CalicoCon sessions. This Q&A covers everything from initial installation tips and version requirements to advanced topics like filtering flow logs and integrating with Goldmane, the powerful API that underpins Whisker.

You don't need a real outage to find your weak spots.

Modern digital services rely on complex systems, and chaos can strike at any layer. But the most effective teams don’t wait for failure to learn. They simulate it. By introducing controlled performance degradations, you can stress your systems, test your dependencies, and uncover hidden risks without touching production. In our latest webinar, Catchpoint experts walk through how teams are building resilience through proactive, safe failure testing, and why it’s become a cornerstone of digital reliability.