OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector

Aug 28, 2025

Missed it live?

Catch the full recording of OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector — a 1-hour workshop on building telemetry pipelines that never drop a signal.

We’ll show you why resilience matters, how to design high-availability architectures, and how to configure the OpenTelemetry Collector with retries, batching, and persistent queues. Plus, you’ll see live demos in both Docker and Kubernetes — including scaling Gateway collectors with an HPA — and how Bindplane makes large-scale management seamless.

In this session you’ll learn:
✅ The Agent–Gateway pattern for resilient pipelines
✅ Load balancing options: Nginx vs the loadbalancing exporter
✅ How retries, persistent queues, and batching prevent data loss
✅ Docker demo: HA Collector setup with Nginx
✅ Kubernetes demo: Gateway auto-scaling with HPA
✅ How Bindplane simplifies fleet-wide collector management

Whether you’re just starting with OpenTelemetry or running production workloads, this deep dive gives you the strategies and configs to keep telemetry flowing during outages, upgrades, and traffic spikes.

Chapters

00:00 Intro

02:20 Agenda

03:40 What happens when your pipeline breaks?

05:01 High Availability in the OpenTelemetry Collector

16:51 Demo 1: High Availability w/ Docker & Nginx

29:57 Demo 2: Collector-level Resilience

39:58 Demo 3: Native Load Balancing w/ Kubernetes

52:01 Architecture Summary & Takeaways

53:44 How does OpAMP work?

56:11 Q&A