Best Local Vision-Language Models?

What are in your opinion the best local vision models to get a good despription of picture for a 16 GB GPU? At the moment I use qwen3 vl 8b thinking q8 but I wonder, if there is a better model around? Often the models is not really to recognize the right kind of clothes and background. submitted by /u/Kitchen_Carpenter195 [link] [comments]