Operations | Monitoring | ITSM | DevOps | Cloud

OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector

Missed it live? Catch the full recording of OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector — a 1-hour workshop on building telemetry pipelines that never drop a signal. We’ll show you why resilience matters, how to design high-availability architectures, and how to configure the OpenTelemetry Collector with retries, batching, and persistent queues. Plus, you’ll see live demos in both Docker and Kubernetes — including scaling Gateway collectors with an HPA — and how Bindplane makes large-scale management seamless.

Kafka Performance Crisis: How We Scaled OpenTelemetry Log Ingestion by 150%

When your telemetry pipeline starts falling behind, the countdown to production impact has already begun. One Bindplane customer operating a large-scale log ingestion pipeline built on the OpenTelemetry Collector and Kafka hit that breaking point. Instead of keeping pace with incoming data, their pipeline was ingesting just 12,000 events per second (EPS) per partition/collector—and this Kafka topic had 16 partitions. In aggregate, that was roughly 192K EPS.

Resilience with Zero Data Loss in High-Volume Telemetry Pipelines with OpenTelemetry and Bindplane

This was the problem one Bindplane customer had with processing enormous S3-stored log files. Our engineering team tackled the problem head-on, enhancing the S3 event receiver with offset tracking and chaos testing methodologies.