Agentified Assessment of Logical Reasoning Agents

ArXi:2603.02788v2 Announce Type: replace We present a framework for evaluating and benchmarking logical reasoning agents when assessment itself must be reproducible, auditable, and robust to execution failures. Building on agentified assessment, we use an assessor agent to issue tasks, enforce execution budgets, parse outputs, and record structured failure types, while the agent under test only needs to expose a standardized agent-to-agent interface. As a, we benchmark an auto-formalization agent for first-order logic (FOL) reasoning on a solver-verified and repaired split of.