MiniMax-M2-BF16-W4A16

This repository contains a quantized checkpoint produced with llm-compressor from the base model MiniMaxAI/MiniMax-M2.

What this model is

  • Base model: MiniMaxAI/MiniMax-M2
  • Quantization pipeline: llm-compressor
  • Quantization recipe: AWQModifier
  • Scheme: W4A16
  • Main quantized targets: MoE expert MLP weights (w1, w2, w3)
  • MoE gates and lm_head are excluded from quantization per recipe

How it was generated

This model was generated in the llm-compressor workspace using the MiniMax M2 quantization flow in examples/quantizing_moe/minimax_m2_example.py.

Reproduction steps:

  1. Prepare environment and install llm-compressor dependencies.
  2. Set model_id in examples/quantizing_moe/minimax_m2_example.py to the BF16 base checkpoint path.
  3. Run the example script:
    • python examples/quantizing_moe/minimax_m2_example.py
  4. The script applies AWQModifier with W4A16 on MiniMax M2 MoE experts (w1/w2/w3) and saves the compressed checkpoint.
  5. Output directory is created as:
    • MiniMax-M2-BF16-W4A16

Notes

  • This is a derived quantized artifact, not an official upstream release from MiniMaxAI.
  • Inference quality/performance may differ from the original BF16 checkpoint depending on workload and hardware.
Downloads last month
15
Safetensors
Model size
34B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ludovicoYIN/MiniMax-M2-BF16-W4A16

Quantized
(45)
this model