5 Pitfalls to Kafka Architecture Implementation

Let’s face it—distributed streaming is an exciting technology that can be leveraged in many ways. Use cases include messaging, log aggregation, distributed tracing, and event sourcing, among others. Distributed streaming can result in significant benefits for companies that choose to use it, but, when not implemented correctly, it can initiate a frustrating technical debt cycle. How do you know if you’re properly implementing Kafka in your environment?


Monitor Amazon Managed Streaming for Apache Kafka with Datadog

Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that allows developers to build highly available and scalable applications on Kafka. In addition to enabling developers to migrate their existing Kafka applications to AWS, Amazon MSK handles the provisioning and maintenance of Kafka and ZooKeeper nodes and automatically replicates data across multiple availability zones for high availability.


Kafka Data Pipelines for Machine Learning Enterprise Applications

Traditional enterprise application platforms are usually built with Java Enterprise technologies and this is the case as well for OpsRamp. However, in machine learning (ML) world, Python is the most commonly used language, with Java rarely used. To develop ML components within enterprise platforms, such as the AIOps capabilities in OpsRamp, we have to run ML components as Python microservices and they communicate with Java microservices in the platform.


How Humio leverages Kafka and brute-force search to get blazing-fast search results

Humio is purpose-built to aggregate and retain billions of streaming logs, then analyze and visualize them to determine the health of the environment — something we describe as “feeling the hum of the system.” Humio developers tenaciously optimize data ingest, retention, compression, and storage to take advantage of today’s modern hardware.


Adding NiFi and Kafka to Cloudera Data Platform

Following the big launch Cloudera Data Platform (CDP), we are excited about introducing some of the key streaming products on the same platform. Apache NiFi and Apache Kafka will be added to CDP Data Hub shortly. We are introducing these capabilities first in the public cloud through CDP Data Hub before we’re adding support for CDP’s broad deployment spectrum across CDP Data Center and CDP Private Cloud.


Distribute Messages Between Java Microservices Using Kafka

Kafka is a fast-streaming service suitable for heavy data streaming. This article presents a technical guide that takes you through the necessary steps to distribute messages between Java microservices using the streaming service Kafka. To conclude, we will briefly present some performance benchmarks as well.


A look inside Kafka Mirrormaker 2

In our previous blog on A Case for Mirromaker 2, we had discussed how enterprises rely on Apache Kafka as an essential component of their data pipelines and require that the data availability and durability guarantees cover for entire cluster or datacenter failures. As we had discussed in the blog, the current Apache Kafka solution with Mirrormaker 1 has known limitations in providing an enterprise managed disaster recovery solution.


Avoiding death by external side effects - a tale of Kafka Streams

At Coralogix, we strive to ensure that our customers get a stable, real-time service at scale. As part of this commitment, we are constantly improving our data ingestion pipeline resiliency and performance. Coralogix ingests messages at extremely high rates — up to tens of billions of messages per day. Every one of these records needs to go through our entire pipeline at near real-time rates: validation, parsing, classification, and ingestion to Elasticsearch.


Kafka, RabbitMQ or Kinesis - Solution Comparison

In your journey to get away from monolithic applications and start streaming data processing, you’ll undoubtedly have to compare three solutions that have each tackled the distributed messaging problem in different ways. Kinesis, Kafka, and RabbitMQ all allow you to create distributed tracing systems for your serverless applications. But which should you choose?


InfluxDB and Kafka: How InfluxData Uses Kafka in Production

Following CTO Paul Dix’s original release announcement for InfluxDB 2.0 and a new release of InfluxDB Cloud 2.0 to public beta, I thought the community would be interested in learning about how InfluxData provides a multi-tenanted, horizontally scalable time series storage.