On the Generalization of SFT: A Reinforcement Learning Perspective with RewardRectification
Dev.to AI
•
Reinforcement Learning
{{ $json.postContent