Tech tags:
Related shared contents:
-
tech12025-11-12
The article discusses the challenges of processing large datasets using single-node frameworks like Polars, DuckDB, and Daft compared to traditional Spark clusters. It highlights the concept of 'cluster fatigue' and the emotional and financial costs associated with running distributed systems. The author conducts a performance comparison of these frameworks on a 650GB dataset stored in Delta Lake on S3, demonstrating that single-node frameworks can effectively handle large datasets without the need for extensive resources. The findings suggest that modern Lake House architectures can benefit from these lightweight alternatives.
-
poc2025-02-04
Sounds very first, but if we work with big dataset, how to handle the data transformation in the memory? If we work with small data, we can rewrite into Parquet format and the performance is not an issue.
-
In productions with: