Alibaba's Qwen team makes AI models think deeper with new algorithm

The Decoder
Generative AI Open Source AI Reinforcement Learning

Reinforcement learning hits a wall with reasoning models because every token gets the same reward. A new algorithm from Alibaba's Qwen team fixes this by weighting each step based on how much it shapes what comes next, doubling the length of thought processes in the process.