Evaluating Semantic Fragility in Text-to-Audio Generation Systems Under Controlled Prompt Perturbations

ArXi:2603.13824v1 Announce Type: cross Recent advances in text-to-audio generation enable models to translate natural-language descriptions into diverse musical output. However, the robustness of these systems under semantically equivalent prompt variations remains largely unexplored. Small linguistic changes may lead to substantial variation in generated audio, raising concerns about reliability in practical use. In this study, we evaluate the semantic fragility of text-to-audio systems under controlled prompt perturbations.