Caution to those using ChatGPT for extremely large projects

GPT-5.4 loses 54% of its retrieval accuracy going from 256K to 1M tokens. Opus 4.6 loses 15%. Every major AI lab now claims a 1M token context window. GPT-5.4 launched eight days ago with 1M. Gemini 3.1 Pro has had it. But the number on the spec sheet and the number that actually works are two very different things. This chart uses MRCR v2, OpenAI’s own benchmark. It hides 8 identical pieces of information across a massive conversation and asks the model to find a specific one.