AI RESEARCH
AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor
arXiv CS.CL
•
ArXi:2601.05752v3 Announce Type: replace