Thinking with Drafting: Optical Decompression via Logical Reconstruction

ArXi:2602.11731v2 Announce Type: replace Existing multimodal large language models have achieved high-fidelity visual perception and exploratory visual generation. However, a precision paradox persists in complex reasoning tasks: optical perception systems transcribe symbols without capturing logical topology, while pixel-based generative models produce visual artifacts lacking mathematical exactness.