Note: I realized some tensors were left uncompressed, due to an error on my part. The fixed weights are now uploaded.

For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11

Feel free to request for other models for compression as well (for either the diffusers library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.

How to Use

I assume you are using some version of the official inference code from Microsoft (either from the official repo before it was taken down, or a community fork).

Install the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):
```
pip install dfloat11[cuda12]
# or if you have CUDA version 11:
# pip install dfloat11[cuda11]
```
To use the DFloat11 model, after including the import statement from dfloat11 import DFloat11Model, simply patch the following lines of code::

        # Load model
        self.model = VibeVoiceForConditionalGenerationInference.from_pretrained(
            self.model_path,
            torch_dtype=torch.bfloat16,
            device_map='cuda',
        )
        self.model.eval()

to:

        # Load model
        self.model = VibeVoiceForConditionalGenerationInference.from_pretrained(
            self.model_path,
            torch_dtype=torch.bfloat16,
            device_map='cpu',
        )
        DFloat11Model.from_pretrained(
            "mingyi456/VibeVoice-7B-DF11",
            device="cpu",
            bfloat16_model=self.model
        )
        self.model.eval()
        self.model.to("cuda")

Compression details

This is the pattern_dict for compression:

pattern_dict={
    r"lm_head": [],
    r"model\.language_model\.embed_tokens": [],
    r"model\.language_model\.layers\.\d+": (
        "self_attn.q_proj",
        "self_attn.k_proj",
        "self_attn.v_proj",
        "self_attn.o_proj",
        "mlp.gate_proj",
        "mlp.up_proj",
        "mlp.down_proj"
    ),

    r"model\.acoustic_connector": (
        "fc1",
        "fc2",
    ),
    r"model\.semantic_connector": (
        "fc1",
        "fc2",
    ),
    

    r"model\.acoustic_tokenizer\.encoder\.stages\.[456]\.\d+\.ffn": (
        "linear1",
        "linear2"
    ),
    
    r"model\.acoustic_tokenizer\.decoder\.stages\.[012]\.\d+\.ffn": (
        "linear1",
        "linear2"
    ),
    

    r"model\.semantic_tokenizer\.encoder\.stages\.[456]\.\d+\.ffn": (
        "linear1",
        "linear2"
    ),


    r"model\.prediction_head\.t_embedder\.mlp": (
        "0",
        "2"
    ),

    r"model\.prediction_head\.layers\.\d+": (
        "ffn.gate_proj",
        "ffn.up_proj",
        "ffn.down_proj",
        "adaLN_modulation.1"
    ),

    r"model\.prediction_head\.final_layer": (
        "linear",
        "adaLN_modulation.1"
    ),

}

Downloads last month: 70

Model tree for mingyi456/VibeVoice-7B-DF11

Base model

vibevoice/VibeVoice-7B

Quantized

(1)

this model