MiniMax-M2-BF16-W4A16

This repository contains a quantized checkpoint produced with llm-compressor from the base model MiniMaxAI/MiniMax-M2.

What this model is

This model was generated in the llm-compressor workspace using the MiniMax M2 quantization flow in examples/quantizing_moe/minimax_m2_example.py.

Reproduction steps:

Prepare environment and install llm-compressor dependencies.
Set model_id in examples/quantizing_moe/minimax_m2_example.py to the BF16 base checkpoint path.
Run the example script:
- python examples/quantizing_moe/minimax_m2_example.py
The script applies AWQModifier with W4A16 on MiniMax M2 MoE experts (w1/w2/w3) and saves the compressed checkpoint.
Output directory is created as:
- MiniMax-M2-BF16-W4A16

This is a derived quantized artifact, not an official upstream release from MiniMaxAI.
Inference quality/performance may differ from the original BF16 checkpoint depending on workload and hardware.

Safetensors

Model size

34B params

Tensor type

I64

I32

BF16

Base model

Quantized

(45)

this model