Decreased Intelligence Density in DeepSeek V4 Pro

In the V3.2 paper, they mentioned: Second, token efficiency remains a challenge; DeepSeek-V3.2 typically requires longer generation trajectories (i.e., tokens) to match the output quality of models like Gemini 3.0-Pro. Future work will focus on optimizing the intelligence density of the model’s reasoning chains to improve efficiency. However, in V4 Pro, the situation seems to have worsened. Even the non-thinking mode uses significantly tokens than V3.2, and V4 Pro (1.6T) is roughly 2.5x larger than V3.2 (0.67T