Get Kafka-Nated Episode 8

Aiven

Dec 9, 2025

Michael Drogalis, founder of ShadowTraffic and former Confluent stream-processing lead, joins Hugh Evans to tackle one of the hardest problems in Kafka: generating realistic synthetic test data. From his work on Kafka Streams, ksqlDB, and the Onyx platform, Michael shares practical insights for engineering teams struggling with testing and demo data at scale.

This episode covers:
🔷 Why generating realistic test data for Kafka is uniquely challenging
🔷 The “$10,000 demo problem” and hidden costs teams face
🔷 Designing ShadowTraffic’s declarative JSON DSL: trade-offs and performance
🔷 Maintaining referential integrity and state across topics
🔷 Modeling real-world patterns like bursty traffic, seasonality, and correlated events
🔷 Key considerations for building or choosing synthetic data tools

Timestamps:

0:01 – Intro & Welcome

1:02 – Michael’s Career Journey

3:09 – Early Open Source Experience

4:02 – Challenges in Generating Realistic Kafka Data

5:55 – The "$10,000 Demo Problem"

8:28 – Shadow Traffic Design Philosophy

13:09 – Lessons from Stream Processing at Confluent

17:22 – Technical Architecture & Lookup Mechanisms

25:05 – Handling Complex Data Patterns

26:44 – Solo Founding Insights & Advice

Learn more about Aiven for Apache Kafka: https://aiven.io/kafka
Learn more about Aiven Inkless: https://aiven.io/inkless
Documentation on Synthetic Data for AI with Aiven and ShadowTraffic: https://aiven.io/developer/synthetic-data-for-ai-with-aiven-and-shadowtraffic
Watch the previous episode of Get Kafka-Nated: https://youtu.be/mBBXipQfK8w