Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ Check the original model card for information about this model.
|
|
| 18 |
|
| 19 |
# Running the model with VLLM in Docker
|
| 20 |
```sh
|
| 21 |
-
sudo docker run --runtime nvidia --gpus all --ipc=host -p 8000:8000 -e VLLM_USE_FLASHINFER_MOE_FP4=1 vllm/vllm-openai:
|
| 22 |
```
|
| 23 |
This was tested on a 2 x RTX Pro 6000 Blackwell cloud instance.
|
| 24 |
|
|
|
|
| 18 |
|
| 19 |
# Running the model with VLLM in Docker
|
| 20 |
```sh
|
| 21 |
+
sudo docker run --runtime nvidia --gpus all --ipc=host -p 8000:8000 -e VLLM_USE_FLASHINFER_MOE_FP4=1 vllm/vllm-openai:latest Firworks/Kimi-Linear-48B-A3B-Instruct-nvfp4 --served-model-name kimi-48b-nvfp4 --max-model-len 32768 --tensor-parallel-size 2 --trust-remote-code --gpu-memory-utilization 0.7
|
| 22 |
```
|
| 23 |
This was tested on a 2 x RTX Pro 6000 Blackwell cloud instance.
|
| 24 |
|