Steer-to-Detect: Probing Hidden Representations for Detection of LLM-Generated Texts

ArXi:2605.12890v1 Announce Type: cross The rapid advancement of large language models (LLMs) has made machine-generated text increasingly difficult to distinguish from human-written text. While recent studies explore leveraging internal representations of language models to uncover deeper detection signals, these raw features often exhibit substantial overlap between classes, limiting their discriminative power. To address this challenge, we propose Steer-to-Detect (\texttt{S2D}), a two-stage framework for detecting LLM-generated text.