AI RESEARCH
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
arXiv CS.AI
•
ArXi:2603.19470v1 Announce Type: cross Off-policy problems such as policy staleness and