k6 k xl ud is bugged
k6 k xl ud is bugged. all the rest is fine
That was fast!!! Outstanding :)
Its free brother. I am not seeing Vision models able to run at home from USA or Europe .... At least that good
Is this model really that bad at detecting multiple celebs in one image? Can someone confirm? Gemini 2.5 flash easily gets it right.
I've found the entire Qwen3-VL series to be the most accurate vision models I've ever used, even beating Gemini, ChatGPT, etc. I ran your image through Qwen-3-VL-30B-A3B-Instruct. I know this is regarding the thinking variant, but I can't load it atm due to other users currently using the Instruct variant on my server. It did really well, only a couple mistakes - completely reasonable for a model this size imo. Both the Instruct & Thinking versions of VL-30B-A3B have been doing extremely good for multimodal use in my experience. (I'm using the FP16 mmproj instead of FP32, btw)
Maybe try implementing the Qwen-Agent "Zoom in tool" that allows the model to zoom in on different aspects of the image for better overal analysis.
I wouldn't call it "couple mistakes". It's less than 50% to me. While gemini flash 100% correct

