Get Kafka-Nated Episode 8
Michael Drogalis, founder of ShadowTraffic and former Confluent stream-processing lead, joins Hugh Evans to tackle one of the hardest problems in Kafka: generating realistic synthetic test data. From his work on Kafka Streams, ksqlDB, and the Onyx platform, Michael shares practical insights for engineering teams struggling with testing and demo data at scale.
This episode covers:
🔷 Why generating realistic test data for Kafka is uniquely challenging
🔷 The “$10,000 demo problem” and hidden costs teams face
🔷 Designing ShadowTraffic’s declarative JSON DSL: trade-offs and performance
🔷 Maintaining referential integrity and state across topics
🔷 Modeling real-world patterns like bursty traffic, seasonality, and correlated events
🔷 Key considerations for building or choosing synthetic data tools
Timestamps:
0:01 – Intro & Welcome
1:02 – Michael’s Career Journey
3:09 – Early Open Source Experience
4:02 – Challenges in Generating Realistic Kafka Data
5:55 – The "$10,000 Demo Problem"
8:28 – Shadow Traffic Design Philosophy
13:09 – Lessons from Stream Processing at Confluent
17:22 – Technical Architecture & Lookup Mechanisms
25:05 – Handling Complex Data Patterns
26:44 – Solo Founding Insights & Advice
Learn more about Aiven for Apache Kafka: https://aiven.io/kafka
Learn more about Aiven Inkless: https://aiven.io/inkless
Documentation on Synthetic Data for AI with Aiven and ShadowTraffic: https://aiven.io/developer/synthetic-data-for-ai-with-aiven-and-shadowtraffic
Watch the previous episode of Get Kafka-Nated: https://youtu.be/mBBXipQfK8w
#apachekafka #json #datastreaming #aiven #shadowtraffic #getkafkanated
Connect With Us
Website: http://aiven.io
LinkedIn: https://www.linkedin.com/company/aiven/
GitHub: https://github.com/aiven
X: https://twitter.com/aiven_io