--- license: apache-2.0 base_model: Qwen/Qwen3-VL-32B-Instruct tags: - exllamav3 - exl3 - quantized - 4-bit - vision - multimodal - instruct language: - en - it - multilingual library_name: exllamav3 pipeline_tag: image-text-to-text --- # Qwen3-VL-32B-Instruct-EXL3-4.0bpw ExLlamaV3 quantization of [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) - A powerful vision-language model for multimodal tasks. ## Quantization Details | Parameter | Value | |-----------|-------| | **Bits per Weight** | 4.0 bpw | | **Head Bits** | 6 bpw | | **Calibration Rows** | 128 | | **Calibration Context** | 4096 tokens | | **Format** | ExLlamaV3 (EXL3) | | **Size** | ~19 GB | ## Model Capabilities - **Vision Understanding**: Process images at various resolutions - **Video Analysis**: Frame-by-frame understanding - **Context Window**: Up to 128K tokens - **Instruction Following**: Fine-tuned for chat and task completion - **Multilingual**: Strong performance across languages ## Hardware Requirements | GPU | VRAM | Notes | |-----|------|-------| | RTX 4090 | 24 GB | Good fit, comfortable with images | | RTX 3090 | 24 GB | Works well | | A100 40GB | 40 GB | Plenty of headroom | ## Use Cases - **Live Assistant**: Real-time screen understanding - **Document Processing**: Extract and analyze document content - **Image Description**: Detailed visual descriptions - **Visual Coding**: Understand code in screenshots - **Chart/Graph Analysis**: Interpret data visualizations ## Usage with TabbyAPI ```yaml # config.yml model: model_dir: models model_name: Qwen3-VL-32B-Instruct-EXL3-4.0bpw network: host: 0.0.0.0 port: 5000 model_defaults: max_seq_len: 16384 cache_mode: Q4 ``` ## Recommended Settings - Temperature: 0.7 - Top-P: 0.8 - Top-K: 20 - Repetition Penalty: 1.05 ## Comparison with Thinking Variant | Model | Best For | |-------|----------| | **This (Instruct)** | Fast responses, direct answers, general tasks | | **Thinking variant** | Complex reasoning, step-by-step analysis | ## Original Model This is a quantization of [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct). All credit for the base model goes to the Qwen team at Alibaba. ## License Apache 2.0 (inherited from base model)