DuckDB | DataFullStack <meta name="description" content="Explore a curated library of data engineering tools, real-world data architectures, and insights from data teams. Your go-to resource for modern data technology, showcasing case studies, tech spikes, and community-driven knowledge." /> <meta name="keywords" content="data engineering, data stack, modern data stack, data tools, data architecture, data solutions, data tech stack, data insights, data engineering community, data products, data technology, data platform, ETL, data integration, big data, data lake, analytics, data warehouse, cloud data, data management, data pipeline" />

DuckDB is an in-process SQL OLAP database management system

Github repository

Tech tags:

Related shared contents:

650GB of Data (Delta Lake on S3). Polars vs DuckDB vs Daft vs Spark.

tech1

2025-11-12

The article discusses the challenges of processing large datasets using single-node frameworks like Polars, DuckDB, and Daft compared to traditional Spark clusters. It highlights the concept of 'cluster fatigue' and the emotional and financial costs associated with running distributed systems. The author conducts a performance comparison of these frameworks on a 650GB dataset stored in Delta Lake on S3, demonstrating that single-node frameworks can effectively handle large datasets without the need for extensive resources. The findings suggest that modern Lake House architectures can benefit from these lightweight alternatives.
Redefining Data Engineering with Go and Apache Arrow

poc

2025-02-04

Sounds very first, but if we work with big dataset, how to handle the data transformation in the memory? If we work with small data, we can rewrite into Parquet format and the performance is not an issue.

In productions with: