650GB of Data (Delta Lake on S3). Polars vs DuckDB vs Daft vs Spark.
Original URL: https://dataengineeringcentral.substack.com/p/650gb-of-data-delta-lake-on-s3-polars
Article Written: November 12, 2025
Added: November 24, 2025
Type: tech1
Summary
The article discusses the challenges of processing large datasets using single-node frameworks like Polars, DuckDB, and Daft compared to traditional Spark clusters. It highlights the concept of 'cluster fatigue' and the emotional and financial costs associated with running distributed systems. The author conducts a performance comparison of these frameworks on a 650GB dataset stored in Delta Lake on S3, demonstrating that single-node frameworks can effectively handle large datasets without the need for extensive resources. The findings suggest that modern Lake House architectures can benefit from these lightweight alternatives.