Operations | Monitoring | ITSM | DevOps | Cloud

Kafka

Collecting Kafka performance metrics

If you’ve already read our guide to key Kafka performance metrics, you’ve seen that Kafka provides a vast array of metrics on performance and resource utilization, which are available in a number of different ways. You’ve also seen that no Kafka performance monitoring solution is complete without also monitoring ZooKeeper. This post covers some different options for collecting Kafka and ZooKeeper metrics, depending on your needs.

Monitoring Kafka with Datadog

Kafka deployments often rely on additional software packages not included in the Kafka codebase itself—in particular, Apache ZooKeeper. A comprehensive monitoring implementation includes all the layers of your deployment so you have visibility into your Kafka cluster and your ZooKeeper ensemble, as well as your producer and consumer applications and the hosts that run them all.

5 Pitfalls to Kafka Architecture Implementation

Let’s face it—distributed streaming is an exciting technology that can be leveraged in many ways. Use cases include messaging, log aggregation, distributed tracing, and event sourcing, among others. Distributed streaming can result in significant benefits for companies that choose to use it, but, when not implemented correctly, it can initiate a frustrating technical debt cycle. How do you know if you’re properly implementing Kafka in your environment?

Monitor Amazon Managed Streaming for Apache Kafka with Datadog

Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that allows developers to build highly available and scalable applications on Kafka. In addition to enabling developers to migrate their existing Kafka applications to AWS, Amazon MSK handles the provisioning and maintenance of Kafka and ZooKeeper nodes and automatically replicates data across multiple availability zones for high availability.

Kafka Data Pipelines for Machine Learning Enterprise Applications

Traditional enterprise application platforms are usually built with Java Enterprise technologies and this is the case as well for OpsRamp. However, in machine learning (ML) world, Python is the most commonly used language, with Java rarely used. To develop ML components within enterprise platforms, such as the AIOps capabilities in OpsRamp, we have to run ML components as Python microservices and they communicate with Java microservices in the platform.

Avoiding death by external side effects - a tale of Kafka Streams

At Coralogix, we strive to ensure that our customers get a stable, real-time service at scale. As part of this commitment, we are constantly improving our data ingestion pipeline resiliency and performance. Coralogix ingests messages at extremely high rates — up to tens of billions of messages per day. Every one of these records needs to go through our entire pipeline at near real-time rates: validation, parsing, classification, and ingestion to Elasticsearch.

Lessons learned from running Kafka at Datadog

At Datadog, we operate 40+ Kafka and ZooKeeper clusters that process trillions of datapoints across multiple infrastructure platforms, data centers, and regions every day. Over the course of operating and scaling these clusters to support increasingly diverse and demanding workloads, we’ve learned a lot about Kafka—and what happens when its default behavior doesn’t align with expectations.

A Complete Introduction to Apache Kafka

Kafka is an open source real-time streaming messaging system and protocol built around the publish-subscribe system. In this system, producers publish data to feeds for which consumers are subscribed to. With Kafka, clients within a system can exchange information with higher performance and lower risk of serious failure. Instead of establishing direct connections between subsystems, clients communicate via a server which brokers the information between producers and consumers.

Create Kafka Topics in 3 Easy Steps

Creating a topic in production is an operative task that requires awareness and preparation. In this tutorial, we’ll explain all the parameters to consider when creating a new topic in production. Setting the partition count and replication factor is required when creating a new Topic and the following choices affect the performance and reliability of your system.