Aggregating Millions of Groups Fast in Apache Arrow DataFusion
Grouped aggregations are a core part of any analytic tool, creating understandable summaries of huge data volumes. Apache Arrow DataFusion’s parallel aggregation capability is 2-3x faster in version 28.0.0 for queries with a large number (10,000 or more) of groups.