Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

ArXi:2604.17696v1 Announce Type: new Games offer a compelling paradigm for developing general reasoning capabilities in language models, as they naturally demand strategic planning, probabilistic inference, and adaptive decision-making. However, existing self-play approaches rely solely on terminal game outcomes, providing no mechanism to distinguish transferable reasoning patterns from game-specific heuristics.