When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

ArXi:2510.15346v2 Announce Type: replace-cross Ensembling Large Language Models (LLMs) has gained attention as a promising approach to surpass the performance of individual models by leveraging their complementary strengths. In particular, aggregating models' next-token probability distributions to select the next token has been shown to be effective in various tasks. However, while successful for short-form answers, its application to long-form generation remains underexplored.