AI RESEARCH

Verifying LLM Inference to Detect Model Weight Exfiltration

arXiv CS.LG

ArXi:2511.02620v3 Announce Type: replace-cross As large AI models become increasingly valuable assets, the risk of model weight exfiltration from inference servers grows accordingly. An attacker controlling an inference server may exfiltrate model weights by hiding them within ordinary model responses, a strategy known as steganography. This work investigates how to verify LLM model inference to defend against such attacks and, broadly, to detect anomalous or buggy behavior during inference.