Subliminal Signals in Preference Labels

ArXi:2603.01204v2 Announce Type: replace As AI systems approach superhuman capabilities, scalable oversight increasingly relies on LLM-as-a-judge frameworks where models evaluate and guide each other's