Generalized Distributional Alignment Games for Unbiased Answer-Level Fine-Tuning

ArXi:2605.02435v1 Announce Type: new The Distributional Alignment Game framework provides a powerful variational perspective on Answer-Level Fine-Tuning (ALFT). However, standard algorithms for these games rely on estimating logarithmic rewards from small batches,