Tech Articles | DataFullStack <meta name="description" content="Explore a curated library of data engineering tools, real-world data architectures, and insights from data teams. Your go-to resource for modern data technology, showcasing case studies, tech spikes, and community-driven knowledge." /> <meta name="keywords" content="data engineering, data stack, modern data stack, data tools, data architecture, data solutions, data tech stack, data insights, data engineering community, data products, data technology, data platform, ETL, data integration, big data, data lake, analytics, data warehouse, cloud data, data management, data pipeline" />

Filter articles by tags or search for specific topics:

Filter by Tags

How Formula 1® uses generative AI to accelerate race-day issue resolution

Original URL: https://aws.amazon.com/blogs/machine-learning/how-formula-1-uses-generative-ai-to-accelerate-race-day-issue-resolution/

Added Date: February 20, 2025

Memo: Very classic Glue job pipeline to feed the AWS Bedrock Knowledge Bases for a RAG use case.
How to use gen AI for better data schema handling, data quality, and data generation

Original URL: https://cloud.google.com/blog/products/data-analytics/how-gemini-in-bigquery-helps-with-data-engineering-tasks/

Added Date: February 19, 2025

Memo: Some good usage of GCP gemini in your data engineering tasks, but I'm concern about my bill of GCP now ^^.
Scaling Large Language Models for e-Commerce: The Development of a Llama-Based Customized LLM

Original URL: https://innovation.ebayinc.com/tech/features/scaling-large-language-models-for-e-commerce-the-development-of-a-llama-based-customized-llm-for-e-commerce/

Added Date: February 18, 2025

Memo:
Introducing Impressions at Netflix (part 1)

Original URL: https://netflixtechblog.com/introducing-impressions-at-netflix-e2b67c88c9fb

Added Date: February 17, 2025

Memo:
The Art of Secure Search: How Wix Mastered PII Data in Vespa Search Engine

Original URL: https://www.wix.engineering/post/the-art-of-secure-search-how-wix-mastered-pii-data-in-vespa-search-engine

Added Date: February 16, 2025

Memo:
Improving Recruiting Efficiency with a Hybrid Bulk Data Processing Framework

Original URL: https://www.linkedin.com/blog/engineering/data-streaming-processing/improving-recruiting-efficiency-with-hybrid-bulk-data-processing-framework

Added Date: February 15, 2025

Memo:
Zenml vs flyte vs metaflow

Original URL: https://mlops.community/zenml-vs-flyte-vs-metaflow/

Added Date: February 14, 2025

Memo:
From concept to reality: Navigating the Journey of RAG from proof of concept to production

Original URL: https://aws.amazon.com/blogs/machine-learning/from-concept-to-reality-navigating-the-journey-of-rag-from-proof-of-concept-to-production/

Added Date: February 13, 2025

Memo: summarise of all the concept and technologies to build a production ready RAG solution.
How Uber Uses Ray® to Optimize the Rides Business

Original URL: https://www.uber.com/en-GB/blog/how-uber-uses-ray-to-optimize-the-rides-business/

Added Date: February 7, 2025

Memo: very nice! Uber runs Ray instances inside Spark executors. This setup allows each Spark task to spawn Ray workers for parallel computation, which boosts performance significantly.
The foundations of Canva’s continuous data platform with Snowpipe Streaming

Original URL: https://www.canva.dev/blog/engineering/snowpipe-streaming/

Added Date: February 6, 2025

Memo: