Firworks
/

Kimi-Linear-48B-A3B-Instruct-nvfp4

8-bit precision

compressed-tensors

Model card Files Files and versions

Firworks commited on 16 days ago

Commit

4600bb1

·

verified ·

1 Parent(s): 66ac5fc

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ Check the original model card for information about this model.
 # Running the model with VLLM in Docker
 ```sh
-sudo docker run --runtime nvidia --gpus all --ipc=host -p 8000:8000 -e VLLM_USE_FLASHINFER_MOE_FP4=1 vllm/vllm-openai:nightly Firworks/Kimi-Linear-48B-A3B-Instruct-nvfp4 --served-model-name kimi-48b-nvfp4 --max-model-len 32768 --tensor-parallel-size 2 --trust-remote-code --gpu-memory-utilization 0.7
 ```
 This was tested on a 2 x RTX Pro 6000 Blackwell cloud instance.

 # Running the model with VLLM in Docker
 ```sh
+sudo docker run --runtime nvidia --gpus all --ipc=host -p 8000:8000 -e VLLM_USE_FLASHINFER_MOE_FP4=1 vllm/vllm-openai:latest Firworks/Kimi-Linear-48B-A3B-Instruct-nvfp4 --served-model-name kimi-48b-nvfp4 --max-model-len 32768 --tensor-parallel-size 2 --trust-remote-code --gpu-memory-utilization 0.7
 ```
 This was tested on a 2 x RTX Pro 6000 Blackwell cloud instance.