Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text

ArXi:2605.06294v1 Announce Type: cross The ability to reliably distinguish human-written text from that generated by large language models is of profound societal importance. The dominant approach to this problem exploits the likelihood hypothesis: that machine-generated text should appear probable to a detector language model than human-written text.