Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

ArXi:2604.21456v1 Announce Type: new We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we