Operations | Monitoring | ITSM | DevOps | Cloud

Running ML/LLM models on Kubernetes Across Major Cloud Providers with Abhishek Choudhary

Abhishek, co-founder and CTO of @truefoundry, explores the complexities of building a machine learning platform on Kubernetes. Discover solutions to challenges like handling diverse hardware, managing large Docker images, and optimizing costs. Learn how True Foundry uses tools like Argo CD, Keda, and Istio to create efficient abstractions for data scientists and streamline ML operations.

Datadog on LLMs: From Chatbots to Autonomous Agents

As companies rapidly adopt Large Language Models (LLMs), understanding their unique challenges becomes crucial. Join us for a special episode of "Datadog On LLMs: From Chatbots to Autonomous Agents," streaming directly from DASH 2024 on Wednesday, June 26th, to discuss this important topic. In this live session, host Jason Hand will be joined by Othmane Abou-Amal from Datadog’s Data Science team and Conor Branagan from the Bits AI team. Together, they will explore the fascinating world of LLMs and their applications at Datadog.

Reward engineers who fix problems before they cause outages

Are you recognizing the good work engineers do to prevent outages? "The people that are out there doing good work to prevent fires from ever occurring, we're not often recognizing them. We're not often rewarding them. And once things go wrong, someone comes in and fixes it. That's great. That's needed. But we're rewarding that behavior. And so it becomes a bit of people are motivated by what behavior you reward.

Reliability-Driven Fleet Management with Komodor

Maintaining a few K8s clusters is hard enough. Maintaining 1000+ clusters is virtually impossible without embracing new tooling and paradigm shifts. Join us for an insightful LIVE workshop exploring the possibilities of Kubernetes Fleet Management with Komodor, lead by Itiel Shwartz* In this session, we will dive into the challenges of multi-cluster management and how Komodor's comprehensive platform simplifies operations. Discover how to gain real-time visibility into your clusters, automate routine tasks, and troubleshoot issues across your entire fleet efficiently.