AI RESEARCH

What should happen when you feed impossible moves into a chess-playing language model? [D]

r/MachineLearning

I'd appreciate some input on an experiment I've been mulling over. You can treat it as straight-up interpretability, but it would have theoretical implications. Karvonen trained a 50M-parameter transformer on chess game transcripts. Just character prediction, no rules, no board representation. It learned to play at ~1500 Elo and developed internal board state representations that linear probes can read. Critically, Karvonen proves that the model learns latent board state representation anyway.