Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

ArXi:2601.15160v3 Announce Type: replace Large language models have achieved near-expert performance in structured reasoning domains like mathematics and programming, yet their ability to perform compositional multi-hop reasoning in specialized scientific fields remains limited. We propose a bottom-up learning paradigm in which models are grounded in axiomatic domain facts and compose them to solve complex, unseen tasks. To this end, we present a post-