GooseReason-4B-Instruct โ€” MLX 4-bit Quantized

This is a 4-bit quantized MLX version of nvidia/Nemotron-Research-GooseReason-4B-Instruct, converted for efficient inference on Apple Silicon using MLX.

Model Overview

Attribute Value
Original Model nvidia/Nemotron-Research-GooseReason-4B-Instruct
Architecture Qwen3 (4.4B parameters)
Quantization 4-bit (MLX)
Base Model Qwen3-4B-Instruct-2507
Training Method RLVR (Reinforcement Learning with Verifiable Rewards)
Max Sequence Length 32,768 tokens
License CC-BY-NC-4.0

About GooseReason-4B

Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.

Key Capabilities

  • Math Reasoning: Strong performance on AIME 2025 and AMC benchmarks
  • Code Generation: Competitive on LiveCodeBench and HumanEval
  • STEM: Broad science and technical reasoning capabilities
  • Thinking Mode: Uses extended thinking (<think> tags) for complex reasoning tasks

Benchmark Highlights

Benchmark GooseReason-4B
AIME 2025 (avg@64) 55.0
AMC (avg@64) 82.2
LiveCodeBench v6 (pass@1) 30.1
GPQA Diamond (avg@8) 47.5

Usage with MLX

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit")

messages = [
    {"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
print(response)

All Available Formats

Acknowledgments

Downloads last month
37
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-4bit