UI Icon Detection with Qwen3.5, Qwen3.6 and Gemma4
r/LocalLLaMA
•
Generative AI
AI Research
Hey everyone, I did a small personal benchmark on using local models to detect UI icons from application screenshots. English is not my first language, so sorry for any grammar mistakes! I just wanted to share what I found in case it helps someone doing similar stuff. Models includes(none quantization): Gemma4-31B-it Qwen3.5-27B Qwen3.6-35B-A3B Approach: I feed the app screenshot into the LLM and ask it to recognize the UI icons and return the bbox_2d coordinates. After it gives me the coordinates, I use supervision to draw red bounding boxes on the image.