Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

ArXi:2605.00861v1 Announce Type: cross This study investigates voice mapping as an evaluation framework for text-to-speech (TTS) synthesis quality. The study analyzes six TTS models, including historical and recent ones. The metrics are crest factor, spectrum balance, and cepstral peak prominence (CPPs). We investigated 6 influential TTS models: Merlin, Tacotron 2, Transformer TTS, FastSpeech 2, Glow-TTS, and VITS. The results nstrate that voice range serves as a primary indicator of model capability, with VITS showing the largest range among tested models.