AP-BMM: Approximating Capability-Efficiency Pareto Sets of LLMs via Asynchronous Prior-guided Bayesian Model Merging

ArXi:2512.09972v5 Announce Type: replace Navigating the capability--efficiency trade-off in Large Language Models (LLMs) requires approximating a high-quality Pareto set. Existing model merging research has focused predominantly on coarse model-level operators, which are easy to apply but offer limited control over the trade-off geometry. Layer-wise merging is expressive, yet current methods still suffer from two bottlenecks: they treat the high-dimensional fusion space as an unstructured black box, and they rely on synchronous optimization despite highly uneven LLM evaluation latency.