From Accuracy to Readiness: Metrics and Benchmarks for Human-AI Decision-Making

ArXi:2603.18895v1 Announce Type: cross Artificial intelligence (AI) systems are deployed as collaborators in human decision-making. Yet, evaluation practices focus primarily on model accuracy rather than whether human-AI teams are prepared to collaborate safely and effectively. Empirical evidence shows that many failures arise from miscalibrated reliance, including overuse when AI is wrong and underuse when it is helpful. This paper proposes a measurement framework for evaluating human-AI decision-making centered on team readiness.