AI RESEARCH

KWBench: Measuring Unprompted Problem Recognition in Knowledge Work

arXiv CS.AI

ArXi:2604.15760v1 Announce Type: new The benchmark contains 223 tasks sourced from practitioners across acquisitions, contract negotiations, clinical pharmacy, organizational politics, fraud analysis, and incentive design. Each task encodes a formal game-theoretic pattern (principal-agent conflict, signaling, mechanism design failure, strategic omission, coalitional dynamics, strategic interdependence) and carries structured ground truth recording the expert reading of the situation and the anticipated failure modes.