Building Zone Failure Resilience in Apache Pinot™ at Uber

Building Zone Failure Resilience in Apache Pinot™ at Uber

Building Zone Failure Resilience in Apache Pinot™ at Uber

Original URL: https://www.uber.com/en-IN/blog/building-zone-failure-resilience-in-apache-pinot-at-uber/

Article Written: November 6, 2023

Added: November 10, 2025

Type: tech2

Summary

The article discusses Uber's implementation of zone failure resilience (ZFR) in Apache Pinot, a real-time analytics platform. It details the strategies used to ensure that Pinot can withstand zone failures without impacting queries or data ingestion. By leveraging instance assignment capabilities and integrating with Uber's isolation groups, the article outlines how they achieved a robust deployment model that enhances operational efficiency and reliability. The migration process for existing clusters to this new setup is also highlighted, showcasing the challenges and solutions involved.

💭 Your Thoughts

- Pool-Based Instance Assignment is introduced to help organize the segments so that each time multiple servers can be restarted at the same time without bringing down any segment. - Replica-Group Segment Assignment: each query will only be routed to the servers within the same replica-group. In order to scale up the cluster, more replica-groups can be added without affecting the fanout of the query, thus not impacting the query performance but increasing the overall throughput linearly.

Technologies Referenced