AI RESEARCH
The Impact of Off-Policy Training Data on Probe Generalisation
arXiv CS.LG
•
ArXi:2511.17408v4 Announce Type: replace-cross Probing has emerged as a promising method for monitoring large language models (LLMs), enabling cheap inference-time detection of concerning behaviours. However, natural examples of many behaviours are rare, forcing researchers to rely on synthetic or off-policy LLM responses for