I discovered PaddleOCR-VL-1. 5 and I was tinkering with it, not sure how to bench test?
r/LocalLLaMA
•
Generative AI
Open Source AI
As the title suggests, I discovered model. ran bunch of batch process, I found my 1650 can't handle it and has to use shared memory.:- 3.9 gb dedicated and 2.0 gb shared. about 24 sec per page so I tried Q8 version, works surprisingly well. only needing around 0.5-0.7 of my shared memory. I was wondering why no one did Q4_K_M or similar format for this particular model. I read somewhere that until Q4 quality remains. So i used llama. cpp to quantize it. now model is 240 mb from 950 mb. The vision model is still running on Fp16.