I tested 4 local VLMs as "bad hands" detectors. Here's which one works best as a judge

We all know that hands can be hard for small local models, so I tried to find the best way to detect bad hands with my local setup (GX10 Spark). I though any VLM like Gemma would work, but not at all. So I had to test several of them and here is my findings: Qwen 3.5 122B is the sweet spot for a benchmark judge. 100% precision (never a false flag), decent recall. Miss rate is on subtle anatomy failures. Gemma 4 26B Reject everything: useless. Qwen3-VL basically passes everything through, useless. Qwen 3.6 27B is a reasonable second opinion but why bother.