Filter articles by tags or search for specific topics:
Filter articles by tags or search for specific topics:
Original URL: https://medium.com/adevinta-tech-blog/from-lakehouse-architecture-to-data-mesh-c532c91f7b61
Added Date: March 4, 2025
Memo: modern data platform architecture based on Databrick tech stack.
Original URL: https://aws.amazon.com/blogs/big-data/design-patterns-for-implementing-hive-metastore-for-amazon-emr-on-eks/
Added Date: March 3, 2025
Memo:
Original URL: https://medium.com/blablacar/data-pipelines-architecture-at-blablacar-3ca43403cb39
Added Date: February 28, 2025
Memo: Very classic MDS
Original URL: https://medium.com/@mcgeehan/redefining-data-engineering-with-go-and-apache-arrow-df9059ddf55c
Added Date: February 27, 2025
Memo: Sounds very first, but if we work with big dataset, how to handle the data transformation in the memory? If we work with small data, we can rewrite into Parquet format and the performance is not an issue.
Original URL: https://medium.com/@community_md101/data-products-a-case-against-medallion-architecture-139096ceea08
Added Date: February 26, 2025
Memo: I read the blog but wasn’t fully convinced by its main argument. In my view, Medallion Architecture is just one way to manage data, and it doesn’t necessarily require physically moving or copying data between different stages. Simply tagging tables should be sufficient. Different stages can enforce distinct archival, retention policies, and operational processes. Additionally, from a high-level perspective, the concept of data products doesn’t fundamentally contradict Medallion Architecture.
Original URL: https://jack-vanlightly.com/blog/2025/2/17/towards-composable-data-platforms
Added Date: February 25, 2025
Memo: My understanding of "Table Virtualization" is share the tables between two data platforms.
Added Date: February 25, 2025
Memo: Interesting architecture to handle bursty and unpredictable traffic on AWS
Original URL: https://medium.com/@ApacheDolphinScheduler/open-source-data-engineering-landscape-2025-db53ce18d53d
Added Date: February 24, 2025
Memo: Real good summary for the main tech products in the different categories of data industry!
Original URL: https://www.generativevalue.com/p/the-unstructured-data-landscape
Added Date: February 23, 2025
Memo:
Added Date: February 21, 2025
Memo: