Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

ArXi:2603.08462v1 Announce Type: new Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Marko property between prompt, reasoning trace, and response.