Preference Goal Tuning: Post-Training as Latent Control for Frozen Policies

ArXi:2412.02125v2 Announce Type: replace Goal-conditioned policies enable decision-making models to execute diverse behaviors based on specified goals, yet their downstream performance is often highly sensitive to the choice of instructions or prompts. To bypass the limitations of discrete text prompts, we formulate post-