Apache Paimon

Apache Paimon

Apache Paimon is an open-source streaming data lake storage framework designed for both batch and real-time processing. It is optimized for high-throughput, low-latency workloads and serves as a lakehouse solution that integrates well with big data engines like Apache Flink, Apache Spark, and Trino.

Web site

Github repository

Tech tags:

Related shared contents:

  • tutorial
    2026-04-11

    The article discusses the implementation of event-driven data agents using Google Cloud technologies like BigQuery, Pub/Sub, and the ADK on Vertex AI. It emphasizes the importance of real-time data processing to address issues such as financial fraud and supply chain disruptions. The architecture allows for immediate anomaly detection and autonomous investigation, thereby reducing the need for manual intervention. The article provides a detailed explanation of the components involved, including continuous queries in BigQuery and Single Message Transforms in Pub/Sub.

  • project
    2023-10-01

    The article discusses the development and implementation of Spot Balancer, a tool created by Notion in collaboration with AWS, which optimizes the use of Spark on Kubernetes by balancing cost and reliability. It highlights the challenges faced when using Spot Instances for Spark jobs and how Spot Balancer allows for better control over executor placement to prevent job failures. The article outlines the transition from Amazon EMR to EMR on EKS and the benefits of dynamic provisioning and efficient resource management. Ultimately, the tool has helped Notion reduce Spark compute costs by 60-90% without sacrificing reliability.

  • project
    2025-12-23

    The article discusses the collaboration between AWS and Visa to introduce Visa Intelligent Commerce, which leverages Amazon Bedrock AgentCore to enable agentic commerce. This new approach allows for seamless, autonomous payment experiences that reduce manual intervention in transactions. The article explains how intelligent agents can handle multi-step tasks in various sectors, particularly in payments and shopping, transforming traditional workflows into more efficient, outcome-driven processes. It also highlights the technical architecture and tools involved in building these agentic workflows.

  • project
    2025-11-13

    The article discusses Yelp's transformation of its data infrastructure through the adoption of a streaming lakehouse architecture on AWS. This modernization aimed to address challenges related to data processing latency, operational complexity, and compliance with regulations like GDPR. By migrating from self-managed Apache Kafka to Amazon MSK and implementing Apache Paimon for storage, Yelp achieved significant improvements, reducing analytics data latencies from 18 hours to minutes and cutting storage costs by over 80%. The article outlines the architectural shifts and technologies involved in this transformation.

In productions with: