Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework

ArXi:2509.21267v3 Announce Type: replace Large language models often generate homogeneous outputs, but whether this is problematic depends on the specific task. For objective math tasks, responses may vary in terms of problem-solving strategy but should maintain the same verifiable answer. Whereas, for creative writing tasks, we often expect variation in key narrative components (e.g. plot, setting, etc.) beyond mere vocabulary diversity. Prior work on homogenization rarely conceptualizes diversity in a task-dependent way.