GooseReason-4B-Instruct — MLX 4-bit Quantized

This is a 4-bit quantized MLX version of nvidia/Nemotron-Research-GooseReason-4B-Instruct, converted for efficient inference on Apple Silicon using MLX.

Model Overview

Attribute	Value
Original Model	nvidia/Nemotron-Research-GooseReason-4B-Instruct
Architecture	Qwen3 (4.4B parameters)
Quantization	4-bit (MLX)
Base Model	Qwen3-4B-Instruct-2507
Training Method	RLVR (Reinforcement Learning with Verifiable Rewards)
Max Sequence Length	32,768 tokens
License	CC-BY-NC-4.0

About GooseReason-4B

Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.

Key Capabilities

Math Reasoning: Strong performance on AIME 2025 and AMC benchmarks
Code Generation: Competitive on LiveCodeBench and HumanEval
STEM: Broad science and technical reasoning capabilities
Thinking Mode: Uses extended thinking (<think> tags) for complex reasoning tasks

Benchmark Highlights

Benchmark	GooseReason-4B
AIME 2025 (avg@64)	55.0
AMC (avg@64)	82.2
LiveCodeBench v6 (pass@1)	30.1
GPQA Diamond (avg@8)	47.5

Usage with MLX

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit")

messages = [
    {"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
print(response)

All Available Formats

Variant	Link	Size
MLX 16-bit	DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit	~8.8 GB
MLX 8-bit	DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-8bit	~4.6 GB
MLX 6-bit	DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-6bit	~3.5 GB
MLX 4-bit	This repo	~2.5 GB
Full Weights	nvidia/Nemotron-Research-GooseReason-4B-Instruct	~8.8 GB

Acknowledgments

NVIDIA for the GooseReason-4B model and RLVR research
Qwen Team for the Qwen3-4B-Instruct-2507 base model
Apple MLX Team for the MLX framework

Downloads last month: 37

Safetensors

Model size

0.6B params

Tensor type

F32

U32

MLX

Hardware compatibility

4-bit

Model tree for DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

nvidia/Nemotron-Research-GooseReason-4B-Instruct

Quantized

(5)

this model