Yet another Reward Hack ...
Dev.to AI
•
Generative AI
Reinforcement Learning
My RL model just found another annoying reward hack. It's a combat game (toribash style). When it win by score, it behead itself to end the match, and because it's an edge case i didn't predict it lose the match (which it doesn't care about) but still get the reward (which is all the model care about). And the code suck because I tried to make it with MiniMax 2.7 and the nerfed Claude Opus have trouble fixing it.