Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

ArXi:2604.19151v1 Announce Type: new Existing Indic ASR benchmarks often use scripted, clean speech and leaderboard driven evaluation that encourages dataset specific overfitting. In addition, strict single reference WER penalizes natural spelling variation in Indian languages, including non standardized spellings of code-mixed English origin words. To address these limitations, we