Ray Data is a scalable data processing library for ML workloads. It provides flexible and performant APIs for scaling Offline batch inference and Data preprocessing and ingest for ML training. Ray Data uses streaming execution to efficiently process large datasets.
Tech tags:
Related shared contents:
-
tutorial2025-12-22
The article discusses the importance of memory in AI agents, particularly how it enables them to learn from past interactions and improve their performance over time. It categorizes memory into three types: session memory, user memory, and learned memory, each with distinct characteristics and benefits. The author provides code examples for implementing these memory types in agents, emphasizing the significance of learned memory for enhancing agent capabilities. The article concludes with a discussion on what constitutes a good learning and the need for human oversight in the learning process.
-
project2026-02-01
The article discusses Spotify's innovative multi-agent architecture designed to enhance its advertising platform. By addressing the fragmented decision-making processes across various advertising channels, the architecture aims to unify workflows and optimize campaign management through specialized AI agents. This approach allows for more efficient budget allocation, audience targeting, and overall campaign performance, leveraging historical data and machine learning. The article highlights the importance of a programmable decision layer and the challenges faced in implementing this system.
-
tech12026-01-14
The article discusses how Slack developed a comprehensive metrics framework to enhance the performance and cost-efficiency of their Apache Spark jobs on Amazon EMR. By integrating generative AI and custom monitoring tools, they achieved significant improvements in job completion times and cost reductions. The framework captures over 40 metrics, providing granular insights into application behavior and resource usage. The article outlines the architecture of their monitoring solution and the benefits of AI-assisted tuning for Spark operations.
-
vision2026-01-01
The article discusses the evolving landscape of data engineering as it adapts to the needs of AI agents in an increasingly automated environment. It emphasizes the importance of building reliable, code-first data platforms that can handle multimodal data and provide context for agents. The shift from traditional data engineering tasks to high-level system supervision is highlighted, along with the necessity for safety and correctness in data pipelines. Ultimately, the article envisions a future where humans and AI agents collaborate seamlessly, transforming data engineering practices.
-
vision2024-11-11
"AI-centric" data processing focuses on preparing and managing large-scale, multimodal datasets efficiently for AI model training, fine-tuning, and deployment, rather than traditional database queries. It involves optimizing computation across heterogeneous resources (CPUs/GPUs), improving data flow efficiency, and enabling scalability—all crucial for building next-generation AI models.
In productions with: