CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with Trajectory Optimization

ArXi:2312.10666v2 Announce Type: replace-cross Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort.