AI RESEARCH

Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs

arXiv CS.AI

ArXi:2604.18179v1 Announce Type: cross Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such as SVIP leave a parallel-serve side-channel, since a dishonest provider can route the verifier's probe to the advertised model while serving ordinary users from a substitute. We propose a commit-open protocol that closes this gap.