Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals

ArXi:2603.26829v1 Announce Type: cross Language models detect false premises when asked directly but absorb them under conversational pressure, producing authoritative professional output built on errors they already identified. This failure - order-gap hallucination - is invisible to output inspection because the error migrates into the activation space of the safety circuit, suppressed but not erased. We