GooseReason-4B-Instruct โ MLX 4-bit Quantized
This is a 4-bit quantized MLX version of nvidia/Nemotron-Research-GooseReason-4B-Instruct, converted for efficient inference on Apple Silicon using MLX.
Model Overview
| Attribute | Value |
|---|---|
| Original Model | nvidia/Nemotron-Research-GooseReason-4B-Instruct |
| Architecture | Qwen3 (4.4B parameters) |
| Quantization | 4-bit (MLX) |
| Base Model | Qwen3-4B-Instruct-2507 |
| Training Method | RLVR (Reinforcement Learning with Verifiable Rewards) |
| Max Sequence Length | 32,768 tokens |
| License | CC-BY-NC-4.0 |
About GooseReason-4B
Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.
Key Capabilities
- Math Reasoning: Strong performance on AIME 2025 and AMC benchmarks
- Code Generation: Competitive on LiveCodeBench and HumanEval
- STEM: Broad science and technical reasoning capabilities
- Thinking Mode: Uses extended thinking (
<think>tags) for complex reasoning tasks
Benchmark Highlights
| Benchmark | GooseReason-4B |
|---|---|
| AIME 2025 (avg@64) | 55.0 |
| AMC (avg@64) | 82.2 |
| LiveCodeBench v6 (pass@1) | 30.1 |
| GPQA Diamond (avg@8) | 47.5 |
Usage with MLX
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit")
messages = [
{"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
print(response)
All Available Formats
| Variant | Link | Size |
|---|---|---|
| MLX 16-bit | DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit | ~8.8 GB |
| MLX 8-bit | DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit | ~4.6 GB |
| MLX 6-bit | DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit | ~3.5 GB |
| MLX 4-bit | This repo | ~2.5 GB |
| Full Weights | nvidia/Nemotron-Research-GooseReason-4B-Instruct | ~8.8 GB |
Acknowledgments
- NVIDIA for the GooseReason-4B model and RLVR research
- Qwen Team for the Qwen3-4B-Instruct-2507 base model
- Apple MLX Team for the MLX framework
- Downloads last month
- 37
Model size
0.6B params
Tensor type
F32
ยท
U32 ยท
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit
Base model
Qwen/Qwen3-4B-Instruct-2507