Filter articles by tags or search for specific topics:
Filter articles by tags or search for specific topics:
Original URL: https://netflixtechblog.medium.com/data-gateway-a-platform-for-growing-and-protecting-the-data-tier-f1ed8db8f5c6
Added Date: October 10, 2024
Memo: Netflix's Data Gateway platform abstracts the complexities of distributed databases, providing scalable, secure, and reliable data access layers (DAL) through standardized gRPC and HTTP APIs.
Original URL: https://medium.com/pinterest-engineering/ray-infrastructure-at-pinterest-0248efe4fd52
Added Date: September 7, 2024
Memo: Pinterest's journey of adopting Ray for infrastructure enhancement started in 2023. It involved overcoming challenges like Kubernetes integration, optimizing resource utilization, and ensuring security. The Ray infrastructure enables scalable, efficient machine learning workloads, significantly improving last-mile data processing, batch inference, and recommender systems model training. By focusing on distributed processing, cost management, and developer velocity, Pinterest achieved improved scalability and operational efficiency for its machine learning applications.
Original URL: https://medium.com/pinterest-engineering/last-mile-data-processing-with-ray-629affbf34ff
Added Date: September 4, 2024
Memo: Pinterest enhances machine learning dataset iteration speed by adopting Ray for distributed processing, addressing bottlenecks in dataset handling for recommender systems. Previously slow processes involving Apache Spark and Airflow workflows now leverage Ray's parallelization, resulting in a significant reduction in training time. Ray’s support for CPU/GPU resource management and streaming execution has led to increased throughput and cost savings, improving ML engineer velocity and overall efficiency in managing large-scale data.
Original URL: https://medium.com/adevinta-tech-blog/building-a-data-mesh-to-support-an-ecosystem-of-data-products-at-adevinta-4c057d06824d
Added Date: September 3, 2024
Memo: Adevinta's Central Product and Tech department has implemented a data mesh architecture to manage and deliver data products across its marketplaces. The initiative emphasizes domain-specific datasets, SQL accessibility, and datasets as products, with a focus on improving decision-making for data analysts, scientists, and product managers. Key strategies include centralized governance, domain-oriented data, and the establishment of working agreements to ensure data quality and alignment across decentralized teams.
Original URL: https://netflixtechblog.com/recommending-for-long-term-member-satisfaction-at-netflix-ac15cada49ef
Added Date: September 2, 2024
Memo: This article discusses how Netflix enhances long-term member satisfaction through personalized recommendations. By moving beyond traditional metrics like clicks or CTR, Netflix uses reward engineering to optimize for long-term satisfaction. The process involves defining proxy rewards based on user interactions, predicting delayed feedback, and aligning recommendations with long-term engagement. Challenges include dealing with delayed feedback, the disparity between online and offline metrics, and refining proxy rewards to better align with long-term satisfaction.
Original URL: https://engineering.atspotify.com/2024/08/unlocking-insights-with-high-quality-dashboards-at-scale/
Added Date: August 29, 2024
Memo: The article discusses Spotify's approach to creating and managing high-quality dashboards at scale. Spotify utilizes Tableau and Looker Studio as primary tools, supported by a Dashboard Quality Framework that ensures consistency and trust in the dashboards. The framework includes automatic checks ('Vital Signs') and a manual design checklist ('Spicy Dashboard Design'). The Dashboard Portal centralizes dashboard access, offering search, curation, and quality labeling features, enhancing the overall accessibility and reliability of dashboards across the company.
Original URL: https://medium.com/airbnb-engineering/automating-data-protection-at-scale-part-1-c74909328e08
Added Date: August 28, 2024
Memo: This article discusses Airbnb's development of a comprehensive Data Protection Platform (DPP) to address challenges in data security and privacy compliance. The platform integrates various services like Madoka for metadata management, Inspekt for data classification, and Cipher for encryption. It highlights the need for automated data protection due to the complexity of handling sensitive data across different environments and the importance of complying with global regulations like GDPR and CCPA.
Original URL: https://sarahsnewsletter.substack.com/p/choosing-a-data-quality-tool
Added Date: April 6, 2022
Memo: It's a good high level summary, but i think each team still need make some spike to find out the suitable tools for their use case and project.
Original URL: https://roundup.getdbt.com/p/from-rows-to-people
Added Date: April 3, 2022
Memo:
Original URL: https://blog.fal.ai/the-unbundling-of-airflow-2/
Added Date: March 29, 2022
Memo: This article is for talk about the idea behind fal dbt, extend the dbt capability on airflow platform. It also talk about a lot of other popular tools on Airflow.