A New End-to-end Framework for Evaluating Voice Agents (EVA)

Conversational voice agents present a distinct evaluation challenge: they must simultaneously satisfy two objectives - accuracy (completing the user's task correctly and faithfully) and conversational experience (doing so naturally, concisely, and in a way appropriate for spoken interaction). These objectives are deeply intertwined: mishearing a confirmation code renders perfect LLM reasoning meaningless, a wall of options overwhelms a caller who can't skim spoken output, and delayed responses c.