Takeaways & discussion about the DeepSeek V4 architecture

Spent the morning looking at the V4 tech report. The benchmarks are getting deserved attention, but I think the architecture is also worth digging into. Quick thoughts below to encourage feedback and discussions. TL;DR - Significant novelties compared to DeepSeek V3 - Hybrid attention: CSA (compressed sparse) + HCA (heavily compressed), instead of going pure MLA or involving SSM / Gated DeltaNet like Qwen3.5+, Mamba, etc. - Manifold-Constrained Hyper-Connections replacing standard residuals ( original mHC paper ) - FP4.