Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

ArXi:2605.11651v1 Announce Type: cross Recent think-answer approaches in VLMs, such as Qwen3-VL-Thinking, boost reasoning performance by leveraging intermediate thinking steps before the final answer, but their high computational cost limits real-world deployment. To distill such capabilities into compact think-answer VLMs, a primary objective is to improve the student's ability to utilize visual evidence throughout its reasoning trace. To this end, we