Firworks commited on
Commit
4600bb1
·
verified ·
1 Parent(s): 66ac5fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ Check the original model card for information about this model.
18
 
19
  # Running the model with VLLM in Docker
20
  ```sh
21
- sudo docker run --runtime nvidia --gpus all --ipc=host -p 8000:8000 -e VLLM_USE_FLASHINFER_MOE_FP4=1 vllm/vllm-openai:nightly Firworks/Kimi-Linear-48B-A3B-Instruct-nvfp4 --served-model-name kimi-48b-nvfp4 --max-model-len 32768 --tensor-parallel-size 2 --trust-remote-code --gpu-memory-utilization 0.7
22
  ```
23
  This was tested on a 2 x RTX Pro 6000 Blackwell cloud instance.
24
 
 
18
 
19
  # Running the model with VLLM in Docker
20
  ```sh
21
+ sudo docker run --runtime nvidia --gpus all --ipc=host -p 8000:8000 -e VLLM_USE_FLASHINFER_MOE_FP4=1 vllm/vllm-openai:latest Firworks/Kimi-Linear-48B-A3B-Instruct-nvfp4 --served-model-name kimi-48b-nvfp4 --max-model-len 32768 --tensor-parallel-size 2 --trust-remote-code --gpu-memory-utilization 0.7
22
  ```
23
  This was tested on a 2 x RTX Pro 6000 Blackwell cloud instance.
24