Mistral Small 4 is kind of awful with images

r/LocalLLaMA
Generative AI Computer Vision Open Source AI

I first started testing with the Q4_K_M and the image recognition was so bad that I assumed there was something wrong with my setup. So, I tested Mistral's official API and the image capabilities are just as terrible, so I believe this may just be the model's actual ability. Given the prompt "Describe this image in detail in around 200 words" and this picture of a music festival, here's the nonsense the official API for Mistral Small 4 came up with: The image captures a vibrant scene at a large stadium during what appears to be an outdoor event, possibly a sports game or concert.