AI RESEARCH
Built a normalizer so WER stops penalizing formatting differences in STT evals! [P]
r/MachineLearning
•
Hey guys! At my company, we've been benchmarking STT engines a lot and kept running into the same issue: WER is penalizing formatting differences that have nothing to do with actual recognition quality. "It's $50" vs "it is fifty dollars", "3:00PM" vs "3 pm". Both perfect transcription, but a terrible error rate. The fix is normalizing both sides before scoring, but every project we had a different script doing it slightly differently. So we built a proper library and open-sourced it. So we