Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning

Original URL: https://netflixtechblog.com/post-training-generative-recommenders-with-advantage-weighted-supervised-finetuning-61a538d717a9

Article Written: October 20, 2023

Added: October 27, 2025

Type: tech2

Summary

This article discusses the challenges and advancements in post-training generative recommender systems, particularly focusing on a novel algorithm called Advantage-Weighted Supervised Fine-tuning (A-SFT). The authors highlight the limitations of traditional reinforcement learning methods in recommendation contexts, such as the lack of counterfactual observations and noisy reward models. A-SFT aims to improve recommendation quality by effectively combining supervised fine-tuning with reinforcement learning techniques. The results demonstrate that A-SFT outperforms existing methods in aligning generative models with user preferences.

Data Problems Addressed

Recommendation with cold start Challenges in Reward Model Generalization for Recommendations