AI RESEARCH

The Last Fingerprint: How Markdown Training Shapes LLM Prose

arXiv CS.CL

ArXi:2603.27006v1 Announce Type: new Large language models produce em dashes at varying rates, and the observation that some models "overuse" them has become one of the most widely discussed markers of AI-generated text. Yet no mechanistic account of this pattern exists, and the parallel observation that LLMs default to markdown-formatted output has never been connected to it. We propose that the em dash is markdown leaking into prose -- the smallest surviving unit of the structural orientation that LLMs acquire from markdown-saturated.