AI RESEARCH

SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

arXiv CS.CL

ArXi:2605.07711v1 Announce Type: new On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differently. Under heterogeneous tokenizers, exact shared-token matching silently discards a large fraction of the teacher signal at precisely the positions where vocabularies disagree.