Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

ArXi:2604.27209v1 Announce Type: cross Large language models can now generate substantial code and draft research text, but research-software projects require than either artifact alone. The mathematical thesis, executable system, benchmark surface, and public claims must mature together, yet often drift apart. We identify two LM-specific failure modes: hallucination accumulation, in which claims exceed what code or theory s and uned assertions propagate across sessions; and desynchronization, in which code, theory, or the model's own world model fall out of alignment.