AI RESEARCH

ROSE: An Intent-Centered Evaluation Metric for NL2SQL

arXiv CS.AI

ArXi:2604.12988v1 Announce Type: cross Execution Accuracy (EX), the widely used metric for evaluating the effectiveness of Natural Language to SQL (NL2SQL) solutions, is becoming increasingly unreliable. It is sensitive to syntactic variation, ignores that questions may admit multiple interpretations, and is easily misled by erroneous ground-truth SQL. To address this, we