What happens when you rip out the residual stream and replace it with a structured workspace (Research Paper - CWT)

r/LocalLLaMA
AI Research

Over the last month I've been working on a custom architecture that fully replaces the residual stream transformers use with a structured workspace. The goal isn't to claim "I beat transformers", it's a thought experiment into what happens structurally when you enforce a workspace instead, and where the compute actually goes. The findings were fun to discover and very interesting. CWT has 22.9M core compute (attn+FFN) vs 41.7M in the compute-matched baseline, and comes within 1.7% PPL, roughly a ~45% gap in core compute for near-equivalent quality.