FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models

ArXi:2601.21187v2 Announce Type: replace-cross Efficiently enhancing the reasoning capabilities of Vision-Language Models (VLMs) by merging them with Large Reasoning Models (LRMs) has emerged as a promising direction. However, existing methods typically operate at a coarse-grained layer level, which often leads to a trade-off between injecting reasoning capabilities and preserving visual capabilities.