SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication

ArXi:2605.07330v1 Announce Type: cross In large-scale reinforcement learning (RL) systems with decoupled Trainer-Rollout execution, the Trainer must regularly synchronize policy weights to the Rollout side to limit policy staleness. When inter-node bandwidth is abundant, such synchronization is usually only a small fraction of end-to-end cost. As model size grows, however, the communication demand rises rapidly.