Would it better to fine-tune Qwen3.5 or a Qwen3-VL for an OCR task?

r/LocalLLaMA
Machine Learning Computer Vision Open Source AI AI Research

I have a set of documents which have complex table structures, which all the small sized OCR models are failing in a few or the other cases. My use case is document pages to markdown. Qwen3-VL-32B was giving quite accurate results but it's too big for the machine and throughput needed. I was thinking of finetuning with 4B and 8B/9B qwen models for better performance. So not quite sure if a dedicated VLM like qwen3-VL would be better or the newer all-in-one qwen3.5 This would be my first time fine-tuning as well, any advice on that is also appreciated.