DeepSeek V4 isn't beating Opus, but it doesn't need to

DeepSeek V4 is not in the same league as GPT-5.5 or Opus 4.7. Benchmarks put it slightly below both of those, roughly on par with Opus 4.6. You can check the numbers yourself here: And yes, benchmarks only tell part of the story. In real-world usage, my experience is that V4 performs at around GPT-5.2 level, solid, consistent, and the best open-source model available right now, but doesn't quite reach Opus 4.6 in practice either.