PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space

ArXi:2509.23184v4 Announce Type: replace The remarkable success of Chain-of-Thought (CoT), which enhances performance by scaling generation steps at test-time, inspires us to ask: can we leverage a similar scaling of computational steps during pre