Replay-buffer engineering for noise-robust quantum circuit optimization

ArXi:2604.21863v1 Announce Type: cross Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step, and the routine discard of noiseless trajectories when re