Rethinking LLM Ensembling from the Perspective of Mixture Models

ArXi:2605.00419v1 Announce Type: new Model ensembling is a well-established technique for improving the performance of machine learning models. Conventionally, this involves averaging the output distributions of multiple models and selecting the most probable label. This idea has been naturally extended to large language models (LLMs), yielding improved performance but incurring substantial computational cost.