Filter articles by tags or search for specific topics:
Filter articles by tags or search for specific topics:
Original URL: https://medium.com/blablacar/data-pipelines-architecture-at-blablacar-3ca43403cb39
Added Date: February 28, 2025
Memo: Very classic MDS
Original URL: https://medium.com/@mcgeehan/redefining-data-engineering-with-go-and-apache-arrow-df9059ddf55c
Added Date: February 27, 2025
Memo: Sounds very first, but if we work with big dataset, how to handle the data transformation in the memory? If we work with small data, we can rewrite into Parquet format and the performance is not an issue.
Original URL: https://medium.com/@community_md101/data-products-a-case-against-medallion-architecture-139096ceea08
Added Date: February 26, 2025
Memo: I read the blog but wasn’t fully convinced by its main argument. In my view, Medallion Architecture is just one way to manage data, and it doesn’t necessarily require physically moving or copying data between different stages. Simply tagging tables should be sufficient. Different stages can enforce distinct archival, retention policies, and operational processes. Additionally, from a high-level perspective, the concept of data products doesn’t fundamentally contradict Medallion Architecture.
Original URL: https://jack-vanlightly.com/blog/2025/2/17/towards-composable-data-platforms
Added Date: February 25, 2025
Memo: My understanding of "Table Virtualization" is share the tables between two data platforms.
Added Date: February 25, 2025
Memo: Interesting architecture to handle bursty and unpredictable traffic on AWS
Original URL: https://medium.com/@ApacheDolphinScheduler/open-source-data-engineering-landscape-2025-db53ce18d53d
Added Date: February 24, 2025
Memo: Real good summary for the main tech products in the different categories of data industry!
Original URL: https://www.generativevalue.com/p/the-unstructured-data-landscape
Added Date: February 23, 2025
Memo:
Added Date: February 21, 2025
Memo:
Original URL: https://aws.amazon.com/blogs/machine-learning/how-formula-1-uses-generative-ai-to-accelerate-race-day-issue-resolution/
Added Date: February 20, 2025
Memo: Very classic Glue job pipeline to feed the AWS Bedrock Knowledge Bases for a RAG use case.
Original URL: https://cloud.google.com/blog/products/data-analytics/how-gemini-in-bigquery-helps-with-data-engineering-tasks/
Added Date: February 19, 2025
Memo: Some good usage of GCP gemini in your data engineering tasks, but I'm concern about my bill of GCP now ^^.