XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning

ArXi:2509.25174v2 Announce Type: replace-cross Sample efficiency is a central property of effective deep reinforcement learning algorithms. Recent work has improved this through added complexity, such as larger models, exotic network architectures, and complex algorithms, which are typically motivated purely by empirical performance. We take a principled approach by focusing on the optimization landscape of the critic network. Using the eigenspectrum and condition number of the critic's Hessian, we systematically investigate the impact of common architectural design decisions on.