Balancing Cost and Reliability for Spark on Kubernetes

Balancing Cost and Reliability for Spark on Kubernetes

Balancing Cost and Reliability for Spark on Kubernetes

Original URL: https://www.notion.com/blog/balancing-cost-and-reliability-for-spark-on-kubernetes

Article Written: October 1, 2023

Added: March 4, 2026

Type: project

Summary

The article discusses the development and implementation of Spot Balancer, a tool created by Notion in collaboration with AWS, which optimizes the use of Spark on Kubernetes by balancing cost and reliability. It highlights the challenges faced when using Spot Instances for Spark jobs and how Spot Balancer allows for better control over executor placement to prevent job failures. The article outlines the transition from Amazon EMR to EMR on EKS and the benefits of dynamic provisioning and efficient resource management. Ultimately, the tool has helped Notion reduce Spark compute costs by 60-90% without sacrificing reliability.

💭 Your Thoughts

A lot of optimisation is based on the AWS EC2 spot instance price variations.

Technologies Referenced