Tagarela - A Portuguese speech dataset from podcasts

ArXi:2603.15326v1 Announce Type: cross Despite significant advances in speech processing, Portuguese remains under-resourced due to the scarcity of public, large-scale, and high-quality datasets. To address this gap, we present a new dataset, named TAGARELA, composed of over 8,972 hours of podcast audio, specifically curated for