Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

ArXi:2604.04418v1 Announce Type: cross As LLMs are deployed in high-stakes settings, users must judge the correctness of individual responses, often relying on model-generated justifications such as reasoning chains or explanations. Yet, no standard measure exists for whether these justifications help users distinguish correct answers from incorrect ones.