Filter articles by tags or search for specific topics:
Filter articles by tags or search for specific topics:
Original URL: https://medium.com/pinterest-engineering/last-mile-data-processing-with-ray-629affbf34ff
Added Date: September 4, 2024
Memo: Pinterest enhances machine learning dataset iteration speed by adopting Ray for distributed processing, addressing bottlenecks in dataset handling for recommender systems. Previously slow processes involving Apache Spark and Airflow workflows now leverage Ray's parallelization, resulting in a significant reduction in training time. Ray’s support for CPU/GPU resource management and streaming execution has led to increased throughput and cost savings, improving ML engineer velocity and overall efficiency in managing large-scale data.
Original URL: https://medium.com/adevinta-tech-blog/building-a-data-mesh-to-support-an-ecosystem-of-data-products-at-adevinta-4c057d06824d
Added Date: September 3, 2024
Memo: Adevinta's Central Product and Tech department has implemented a data mesh architecture to manage and deliver data products across its marketplaces. The initiative emphasizes domain-specific datasets, SQL accessibility, and datasets as products, with a focus on improving decision-making for data analysts, scientists, and product managers. Key strategies include centralized governance, domain-oriented data, and the establishment of working agreements to ensure data quality and alignment across decentralized teams.
Original URL: https://netflixtechblog.com/recommending-for-long-term-member-satisfaction-at-netflix-ac15cada49ef
Added Date: September 2, 2024
Memo: This article discusses how Netflix enhances long-term member satisfaction through personalized recommendations. By moving beyond traditional metrics like clicks or CTR, Netflix uses reward engineering to optimize for long-term satisfaction. The process involves defining proxy rewards based on user interactions, predicting delayed feedback, and aligning recommendations with long-term engagement. Challenges include dealing with delayed feedback, the disparity between online and offline metrics, and refining proxy rewards to better align with long-term satisfaction.
Original URL: https://engineering.atspotify.com/2024/08/unlocking-insights-with-high-quality-dashboards-at-scale/
Added Date: August 29, 2024
Memo: The article discusses Spotify's approach to creating and managing high-quality dashboards at scale. Spotify utilizes Tableau and Looker Studio as primary tools, supported by a Dashboard Quality Framework that ensures consistency and trust in the dashboards. The framework includes automatic checks ('Vital Signs') and a manual design checklist ('Spicy Dashboard Design'). The Dashboard Portal centralizes dashboard access, offering search, curation, and quality labeling features, enhancing the overall accessibility and reliability of dashboards across the company.
Original URL: https://medium.com/airbnb-engineering/automating-data-protection-at-scale-part-1-c74909328e08
Added Date: August 28, 2024
Memo: This article discusses Airbnb's development of a comprehensive Data Protection Platform (DPP) to address challenges in data security and privacy compliance. The platform integrates various services like Madoka for metadata management, Inspekt for data classification, and Cipher for encryption. It highlights the need for automated data protection due to the complexity of handling sensitive data across different environments and the importance of complying with global regulations like GDPR and CCPA.
Original URL: https://sarahsnewsletter.substack.com/p/choosing-a-data-quality-tool
Added Date: April 6, 2022
Memo: It's a good high level summary, but i think each team still need make some spike to find out the suitable tools for their use case and project.
Original URL: https://roundup.getdbt.com/p/from-rows-to-people
Added Date: April 3, 2022
Memo:
Original URL: https://blog.fal.ai/the-unbundling-of-airflow-2/
Added Date: March 29, 2022
Memo: This article is for talk about the idea behind fal dbt, extend the dbt capability on airflow platform. It also talk about a lot of other popular tools on Airflow.
Original URL: https://dagster.io/blog/rebundling-the-data-platform
Added Date: March 28, 2022
Memo: I think Dagster has zoom in from Job level view to the asset/table level view for the pipelines. There is always having the Pro and Cons.
Original URL: https://cloud.google.com/blog/topics/customers/google-cloud-helps-uk-based-fluidly-scale
Added Date: March 24, 2022
Memo: It's a good showcase blog for GCP, but it would be very interesting to see some more detail about how Fluidly data team leverage GCP to launch their new data driven business products.