Apache Paimon is an open-source streaming data lake storage framework designed for both batch and real-time processing. It is optimized for high-throughput, low-latency workloads and serves as a lakehouse solution that integrates well with big data engines like Apache Flink, Apache Spark, and Trino.
Tech tags:
Related shared contents:
-
project2023-10-01
The article discusses the development and implementation of Spot Balancer, a tool created by Notion in collaboration with AWS, which optimizes the use of Spark on Kubernetes by balancing cost and reliability. It highlights the challenges faced when using Spot Instances for Spark jobs and how Spot Balancer allows for better control over executor placement to prevent job failures. The article outlines the transition from Amazon EMR to EMR on EKS and the benefits of dynamic provisioning and efficient resource management. Ultimately, the tool has helped Notion reduce Spark compute costs by 60-90% without sacrificing reliability.
-
project2025-12-23
The article discusses the collaboration between AWS and Visa to introduce Visa Intelligent Commerce, which leverages Amazon Bedrock AgentCore to enable agentic commerce. This new approach allows for seamless, autonomous payment experiences that reduce manual intervention in transactions. The article explains how intelligent agents can handle multi-step tasks in various sectors, particularly in payments and shopping, transforming traditional workflows into more efficient, outcome-driven processes. It also highlights the technical architecture and tools involved in building these agentic workflows.
-
project2025-11-13
The article discusses Yelp's transformation of its data infrastructure through the adoption of a streaming lakehouse architecture on AWS. This modernization aimed to address challenges related to data processing latency, operational complexity, and compliance with regulations like GDPR. By migrating from self-managed Apache Kafka to Amazon MSK and implementing Apache Paimon for storage, Yelp achieved significant improvements, reducing analytics data latencies from 18 hours to minutes and cutting storage costs by over 80%. The article outlines the architectural shifts and technologies involved in this transformation.
-
In productions with: