Tech Articles | DataFullStack <meta name="description" content="Explore a curated library of data engineering tools, real-world data architectures, and insights from data teams. Your go-to resource for modern data technology, showcasing case studies, tech spikes, and community-driven knowledge." /> <meta name="keywords" content="data engineering, data stack, modern data stack, data tools, data architecture, data solutions, data tech stack, data insights, data engineering community, data products, data technology, data platform, ETL, data integration, big data, data lake, analytics, data warehouse, cloud data, data management, data pipeline" />

Filter articles by tags or search for specific topics:

Filter by Tags

Data Pipelines Architecture at BlaBlaCar

Original URL: https://medium.com/blablacar/data-pipelines-architecture-at-blablacar-3ca43403cb39

Added Date: February 28, 2025

Memo: Very classic MDS
Redefining Data Engineering with Go and Apache Arrow

Original URL: https://medium.com/@mcgeehan/redefining-data-engineering-with-go-and-apache-arrow-df9059ddf55c

Added Date: February 27, 2025

Memo: Sounds very first, but if we work with big dataset, how to handle the data transformation in the memory? If we work with small data, we can rewrite into Parquet format and the performance is not an issue.
Data Products: A Case Against Medallion Architecture

Original URL: https://medium.com/@community_md101/data-products-a-case-against-medallion-architecture-139096ceea08

Added Date: February 26, 2025

Memo: I read the blog but wasn’t fully convinced by its main argument. In my view, Medallion Architecture is just one way to manage data, and it doesn’t necessarily require physically moving or copying data between different stages. Simply tagging tables should be sufficient. Different stages can enforce distinct archival, retention policies, and operational processes. Additionally, from a high-level perspective, the concept of data products doesn’t fundamentally contradict Medallion Architecture.
Towards composable data platforms

Original URL: https://jack-vanlightly.com/blog/2025/2/17/towards-composable-data-platforms

Added Date: February 25, 2025

Memo: My understanding of "Table Virtualization" is share the tables between two data platforms.
WellRight modernizes to an event-driven architecture to manage bursty and unpredictable traffic

Original URL: https://aws.amazon.com/blogs/architecture/wellright-modernizes-to-an-event-driven-architecture-to-manage-bursty-and-unpredictable-traffic/

Added Date: February 25, 2025

Memo: Interesting architecture to handle bursty and unpredictable traffic on AWS
Open Source Data Engineering Landscape 2025

Original URL: https://medium.com/@ApacheDolphinScheduler/open-source-data-engineering-landscape-2025-db53ce18d53d

Added Date: February 24, 2025

Memo: Real good summary for the main tech products in the different categories of data industry!
The Unstructured Data Landscape

Original URL: https://www.generativevalue.com/p/the-unstructured-data-landscape

Added Date: February 23, 2025

Memo:
The Evolution of Business Intelligence

Original URL: https://alirezasadeghi1.medium.com/the-evolution-of-business-intelligence-from-monolithic-to-composable-architecture-7a46d42374e9

Added Date: February 21, 2025

Memo:
How Formula 1® uses generative AI to accelerate race-day issue resolution

Original URL: https://aws.amazon.com/blogs/machine-learning/how-formula-1-uses-generative-ai-to-accelerate-race-day-issue-resolution/

Added Date: February 20, 2025

Memo: Very classic Glue job pipeline to feed the AWS Bedrock Knowledge Bases for a RAG use case.
How to use gen AI for better data schema handling, data quality, and data generation

Original URL: https://cloud.google.com/blog/products/data-analytics/how-gemini-in-bigquery-helps-with-data-engineering-tasks/

Added Date: February 19, 2025

Memo: Some good usage of GCP gemini in your data engineering tasks, but I'm concern about my bill of GCP now ^^.