OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector
Missed it live?
Catch the full recording of OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector — a 1-hour workshop on building telemetry pipelines that never drop a signal.
We’ll show you why resilience matters, how to design high-availability architectures, and how to configure the OpenTelemetry Collector with retries, batching, and persistent queues. Plus, you’ll see live demos in both Docker and Kubernetes — including scaling Gateway collectors with an HPA — and how Bindplane makes large-scale management seamless.
In this session you’ll learn:
✅ The Agent–Gateway pattern for resilient pipelines
✅ Load balancing options: Nginx vs the loadbalancing exporter
✅ How retries, persistent queues, and batching prevent data loss
✅ Docker demo: HA Collector setup with Nginx
✅ Kubernetes demo: Gateway auto-scaling with HPA
✅ How Bindplane simplifies fleet-wide collector management
Whether you’re just starting with OpenTelemetry or running production workloads, this deep dive gives you the strategies and configs to keep telemetry flowing during outages, upgrades, and traffic spikes.
Chapters
00:00 Intro
02:20 Agenda
03:40 What happens when your pipeline breaks?
05:01 High Availability in the OpenTelemetry Collector
16:51 Demo 1: High Availability w/ Docker & Nginx
29:57 Demo 2: Collector-level Resilience
39:58 Demo 3: Native Load Balancing w/ Kubernetes
52:01 Architecture Summary & Takeaways
53:44 How does OpAMP work?
56:11 Q&A