---
license: apache-2.0
base_model: Qwen/Qwen3-VL-32B-Instruct
tags:
  - exllamav3
  - exl3
  - quantized
  - 4-bit
  - vision
  - multimodal
  - instruct
language:
  - en
  - it
  - multilingual
library_name: exllamav3
pipeline_tag: image-text-to-text
---

# Qwen3-VL-32B-Instruct-EXL3-4.0bpw

ExLlamaV3 quantization of [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) - A powerful vision-language model for multimodal tasks.

## Quantization Details

| Parameter | Value |
|-----------|-------|
| **Bits per Weight** | 4.0 bpw |
| **Head Bits** | 6 bpw |
| **Calibration Rows** | 128 |
| **Calibration Context** | 4096 tokens |
| **Format** | ExLlamaV3 (EXL3) |
| **Size** | ~19 GB |

## Model Capabilities

- **Vision Understanding**: Process images at various resolutions
- **Video Analysis**: Frame-by-frame understanding
- **Context Window**: Up to 128K tokens
- **Instruction Following**: Fine-tuned for chat and task completion
- **Multilingual**: Strong performance across languages

## Hardware Requirements

| GPU | VRAM | Notes |
|-----|------|-------|
| RTX 4090 | 24 GB | Good fit, comfortable with images |
| RTX 3090 | 24 GB | Works well |
| A100 40GB | 40 GB | Plenty of headroom |

## Use Cases

- **Live Assistant**: Real-time screen understanding
- **Document Processing**: Extract and analyze document content
- **Image Description**: Detailed visual descriptions
- **Visual Coding**: Understand code in screenshots
- **Chart/Graph Analysis**: Interpret data visualizations

## Usage with TabbyAPI

```yaml
# config.yml
model:
  model_dir: models
  model_name: Qwen3-VL-32B-Instruct-EXL3-4.0bpw

network:
  host: 0.0.0.0
  port: 5000

model_defaults:
  max_seq_len: 16384
  cache_mode: Q4
```

## Recommended Settings

- Temperature: 0.7
- Top-P: 0.8
- Top-K: 20
- Repetition Penalty: 1.05

## Comparison with Thinking Variant

| Model | Best For |
|-------|----------|
| **This (Instruct)** | Fast responses, direct answers, general tasks |
| **Thinking variant** | Complex reasoning, step-by-step analysis |

## Original Model

This is a quantization of [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct). All credit for the base model goes to the Qwen team at Alibaba.

## License

Apache 2.0 (inherited from base model)