AI RESEARCH

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

arXiv CS.CL

ArXi:2605.19908v1 Announce Type: new Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuation density, and function-word frequency are equally available at every layer in every model, including in an off-the-shelf control encoder, hence the gap not coming from representation quality.