Balancing Cost and Reliability for Spark on Kubernetes
Original URL: https://www.notion.com/blog/balancing-cost-and-reliability-for-spark-on-kubernetes
Article Written: October 1, 2023
Added: March 4, 2026
Type: project
Summary
The article discusses the development and implementation of Spot Balancer, a tool created by Notion in collaboration with AWS, which optimizes the use of Spark on Kubernetes by balancing cost and reliability. It highlights the challenges faced when using Spot Instances for Spark jobs and how Spot Balancer allows for better control over executor placement to prevent job failures. The article outlines the transition from Amazon EMR to EMR on EKS and the benefits of dynamic provisioning and efficient resource management. Ultimately, the tool has helped Notion reduce Spark compute costs by 60-90% without sacrificing reliability.
💠Your Thoughts
A lot of optimisation is based on the AWS EC2 spot instance price variations.