AI RESEARCH

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

arXiv CS.AI

ArXi:2603.19470v1 Announce Type: cross Off-policy problems such as policy staleness and