Ray | DataFullStack <meta name="description" content="Explore a curated library of data engineering tools, real-world data architectures, and insights from data teams. Your go-to resource for modern data technology, showcasing case studies, tech spikes, and community-driven knowledge." /> <meta name="keywords" content="data engineering, data stack, modern data stack, data tools, data architecture, data solutions, data tech stack, data insights, data engineering community, data products, data technology, data platform, ETL, data integration, big data, data lake, analytics, data warehouse, cloud data, data management, data pipeline" />

Ray is a distributed execution framework that makes it easy to scale your applications and to leverage state of the art machine learning libraries.

Github repository

Tech tags:

Related shared contents:

How Uber Uses Ray® to Optimize the Rides Business

project

2025-01-09

very nice! Uber runs Ray instances inside Spark executors. This setup allows each Spark task to spawn Ray workers for parallel computation, which boosts performance significantly.
Ray Infrastructure at Pinterest

project

2024-06-14

Pinterest's journey of adopting Ray for infrastructure enhancement started in 2023. It involved overcoming challenges like Kubernetes integration, optimizing resource utilization, and ensuring security. The Ray infrastructure enables scalable, efficient machine learning workloads, significantly improving last-mile data processing, batch inference, and recommender systems model training. By focusing on distributed processing, cost management, and developer velocity, Pinterest achieved improved scalability and operational efficiency for its machine learning applications.
Last Mile Data Processing with Ray

project

2023-09-12

Pinterest enhances machine learning dataset iteration speed by adopting Ray for distributed processing, addressing bottlenecks in dataset handling for recommender systems. Previously slow processes involving Apache Spark and Airflow workflows now leverage Ray's parallelization, resulting in a significant reduction in training time. Ray’s support for CPU/GPU resource management and streaming execution has led to increased throughput and cost savings, improving ML engineer velocity and overall efficiency in managing large-scale data.

In productions with: