Over the last few months, Honeycomb’s platform team migrated to a new iteration of our ingest pipeline for customer events. Our migration to this newer architecture did not go too smoothly, as can be attested by our status page since February. There were also many near-incidents where we got paged and reacted quickly enough to avoid major issues. We’ve decided to write a full overview of all the challenges we had encountered, which you can can download.
Today, we are pleased to announce a partnership with Confluent to jointly develop and deliver an enhanced product experience to the Kafka-Elasticsearch community. Kafka is — and has been since the very early days — an important component of the Elastic ecosystem.
Kafka is a distributed, highly available event streaming platform which can be run on bare metal, virtualized, containerized, or as a managed service. At its heart, Kafka is a publish/subscribe (or pub/sub) system, which provides a "broker" to dole out events. Publishers post events to topics, and consumers subscribe to topics. When a new event is sent to a topic, consumers that subscribe to the topic will receive a new event notification.
Running systems in production involves requirements for high availability, resilience and recovery from failure. When running cloud native applications this becomes even more critical, as the base assumption in such environments is that compute nodes will suffer outages, Kubernetes nodes will go down and microservices instances are likely to fail, yet the service is expected to remain up and running.
I have seen many junior data scientists and machine learning engineers start a new job or a consulting engagement for a telecom company coming from different industries and thinking that it’s yet another project like many others. What they usually don’t know is that “It’s a trap!”. I spent several years forging telecom data into valuable insights, and looking back, there are a couple of things I would have loved to know at the beginning of my journey.
Data streaming is important for getting insights in real time and reacting to events as fast as possible. Its application is wide, from banking transactions and website click analytics to IoT devices and motorsports. The last example represents a really interesting use case.
As software is evolving away from monoliths and towards service-based architectures, it is becoming more apparent than ever that logging performance needs to be a first-class consideration of our architectural designs. This article will explore how to build and maintain a logging solution for a microservice-oriented containerized application, how to address some common difficulties which come with running such a solution, plus some tips for debugging and eliminating bottlenecks.