A post-transformer architecture just crushed LLMs on Sudoku Extreme. Is the transformer hitting a reasoning wall nobody wants to talk about?

Went down a rabbit hole this week. We've all been watching the reasoning model arms race. The assumption is that if we just scale chain-of-thought hard enough, these models will eventually reason through anything. But there's a result that challenges that. A company called Pathway just published a benchmark on Sudoku Extreme, a dataset of about 250,000 of the hardest Sudoku puzzles. Their reported result: their model at 97.4% accuracy (without CoT or tool-calling or backtracking), while leading LLMs were near 0.