XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition

ArXi:2605.14754v1 Announce Type: new Large Language Models (LLMs) are increasingly deployed for knowledge synthesis, yet their capacity for compositional generalization in scientific knowledge remains under-characterized. Existing benchmarks primarily focus on single-turn restricted scenarios, failing to capture the capability boundaries exposed by real-world interactive scientific workflows. To address this, we