Alibaba's Qwen team makes AI models think deeper with new algorithm
The Decoder
•
Generative AI
Open Source AI
Reinforcement Learning
Reinforcement learning hits a wall with reasoning models because every token gets the same reward. A new algorithm from Alibaba's Qwen team fixes this by weighting each step based on how much it shapes what comes next, doubling the length of thought processes in the process.