AI RESEARCH
Entropy Aware Reward Guidance for Diffusion Language Model Alignment
arXiv CS.AI
•
ArXi:2602.05000v2 Announce Type: replace-cross Reward guidance, also known as posterior sampling, is a popular method for test-time adaptation and post-