GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies

ArXi:2603.14245v1 Announce Type: cross Flow-matching policies hold great promise for reinforcement learning (RL) by capturing complex, multi-modal action distributions. However, their practical application is often hindered by prohibitive inference latency and ineffective online exploration. Although recent works have employed one-step distillation for fast inference, the structure of the initial noise distribution remains an overlooked factor that presents significant untapped potential.