MiniMax-M2-BF16-W4A16
This repository contains a quantized checkpoint produced with llm-compressor from the base model MiniMaxAI/MiniMax-M2.
What this model is
- Base model:
MiniMaxAI/MiniMax-M2 - Quantization pipeline:
llm-compressor - Quantization recipe:
AWQModifier - Scheme:
W4A16 - Main quantized targets: MoE expert MLP weights (
w1,w2,w3) - MoE gates and
lm_headare excluded from quantization per recipe
How it was generated
This model was generated in the llm-compressor workspace using the MiniMax M2 quantization flow in examples/quantizing_moe/minimax_m2_example.py.
Reproduction steps:
- Prepare environment and install
llm-compressordependencies. - Set
model_idinexamples/quantizing_moe/minimax_m2_example.pyto the BF16 base checkpoint path. - Run the example script:
python examples/quantizing_moe/minimax_m2_example.py
- The script applies
AWQModifierwithW4A16on MiniMax M2 MoE experts (w1/w2/w3) and saves the compressed checkpoint. - Output directory is created as:
MiniMax-M2-BF16-W4A16
Notes
- This is a derived quantized artifact, not an official upstream release from MiniMaxAI.
- Inference quality/performance may differ from the original BF16 checkpoint depending on workload and hardware.
- Downloads last month
- 15
Model tree for ludovicoYIN/MiniMax-M2-BF16-W4A16
Base model
MiniMaxAI/MiniMax-M2