Instructions to use Dat1710/nexus-1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Dat1710/nexus-1.5b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Dat1710/nexus-1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Dat1710/nexus-1.5b")
model = AutoModelForCausalLM.from_pretrained("Dat1710/nexus-1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Dat1710/nexus-1.5b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Dat1710/nexus-1.5b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Dat1710/nexus-1.5b

SGLang

How to use Dat1710/nexus-1.5b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Dat1710/nexus-1.5b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Dat1710/nexus-1.5b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Dat1710/nexus-1.5b with Docker Model Runner:
```
docker model run hf.co/Dat1710/nexus-1.5b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Nexus-1.5B

Nexus-1.5B is a 1.54-billion-parameter mathematical reasoning model developed by Neuriton, trained via Length-Penalized Reward Optimization (LPRO) — a novel reinforcement learning alignment method that improves both accuracy and response conciseness simultaneously.

Built on top of Qwen2.5-Math-1.5B-Instruct, Nexus-1.5B achieves 80.2 on MATH-500 and 85.2 on GSM8K (CoT), surpassing its base model by +4.4 points on MATH-500 while reducing average response length by 14%.

What is LPRO?

Standard GRPO (Group Relative Policy Optimization) suffers from two key problems:

Length bias — short responses receive disproportionately large gradient signals, implicitly penalizing long correct derivations.
Entropy collapse — symmetric probability-ratio clipping causes the policy to converge to a narrow set of solution patterns, limiting further improvement.

LPRO fixes both with three targeted modifications:

Component	What it does
Asymmetric clipping	Decouples the lower and upper clip bounds (`ε_low=0.20`, `ε_high=0.28`) to preserve policy entropy
Token-level normalization	Replaces per-response weight `1/G` with global weight `1/Σ
Length-penalized advantage	Adds a group-standardized length penalty: `Aᵢ = (rᵢ - μᵣ)/(σᵣ + ε) - λ·(Lᵢ - μ_L)/(σ_L + ε)`

The final objective is:

$\mathcal{J}_{\text{LPRO}}(\theta) = \mathbb{E}\left[\frac{1}{\sum_{i=1}^{G}|o_i|} \sum_{i=1}^{G}\sum_{t=1}^{|o_i|} \min\!\left(r_{i,t}(\theta)\,\hat{A}_{i,t},\ \text{clip}_{\text{asym}}(r_{i,t}(\theta))\,\hat{A}_{i,t}\right)\right]$

Model Details

Property	Value
Base model	`Qwen/Qwen2.5-Math-1.5B-Instruct`
Parameters	1.54B
Architecture	Transformer Decoder (28 layers, GQA, RoPE, SwiGLU, RMSNorm)
Context length	8,192 tokens
Vocabulary size	128,256
Training method	LPRO (RL fine-tuning, no distillation)
Training data	100 difficulty-filtered problems from MATH-500
Group size G	4
Length penalty λ	0.10
Learning rate	1e-6
PPO epochs/iter	4

Benchmark Results

Chain-of-Thought (CoT)

Model	GSM8K	MATH-500	MMLU-STEM	CMATH	GaoKao Cloze	GaoKao QA
Qwen2-Math-1.5B-Instruct	84.2	69.4	54.9	79.6	59.7	50.7
Qwen2.5-Math-1.5B-Instruct	84.8	75.8	57.5	83.0	65.5	54.1
Nexus-1.5B	85.2	80.2	60.3	83.5	67.2	56.9

Tool-Integrated Reasoning (TIR)

Model	MATH-500	Minerva Math	GaoKao 2023 EN	Olympiad Bench	College Math
Qwen2.5-Math-1.5B-Instruct	80.0	34.0	68.0	49.0	54.0
Nexus-1.5B	84.0	40.0	74.0	56.0	57.0

Ablation: Effect of Length Penalty (λ)

λ	MATH-500 Acc.	Avg. Response Length
0.0 (GRPO baseline)	77.4	312 tokens
0.1 (Nexus-1.5B)	80.2	268 tokens
0.3 (over-penalized)	78.0	201 tokens

Key insight: At λ=0.1, accuracy and conciseness improve simultaneously. The length penalty acts as a de-noising regularizer — discouraging redundant steps rather than suppressing genuinely long derivations.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Dat1710/nexus-1.5b"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Chain-of-Thought prompt
system_prompt = "Please reason step by step, and put your final answer within \\boxed{}."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Find all functions f: ℝ⁺ → ℝ⁺ such that for each x ∈ ℝ⁺, there is exactly one y ∈ ℝ⁺ satisfying xf(y) + yf(x) ≤ 2."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048,
    temperature=0.7,
    do_sample=True,
)

generated_ids = [
    output_ids[len(input_ids):]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Tool-Integrated Reasoning (TIR)

system_prompt = (
    "Please integrate natural language reasoning with programs to solve the problem above, "
    "and put your final answer within \\boxed{}."
)

Evaluation Prompt Format

CoT (8-shot for GSM8K, 4-shot for MATH-500):

<|im_start|>system
Please reason step by step, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
{problem}<|im_end|>
<|im_start|>assistant

TIR (zero-shot):

<|im_start|>system
Please integrate natural language reasoning with programs to solve the problem above,
and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
{problem}<|im_end|>
<|im_start|>assistant

Training Details

Data Curation

Training problems are sourced from MATH-500 and filtered by difficulty using a learnable-zone criterion: a problem is retained if, among 8 sampled solutions from the base model, between 2 and 5 are correct. This yields 100 training problems that provide meaningful gradient signal — neither trivially easy nor intractably hard.

Training Procedure

Group sampling: For each prompt, sample G=4 responses from the current policy.
Reward computation: Rule-based binary reward (correctness via symbolic answer matching) + small format bonus (α=0.1) for well-formed \boxed{} output.
Advantage computation: Compute length-penalized group z-score advantages.
Policy update: Maximize LPRO objective for 4 epochs per iteration.
Iterate: Set old policy ← new policy and repeat.

Reward Function

$r_i = \mathbf{1}[\hat{a}(o_i) = a^*] + 0.1 \cdot \mathbf{1}[\text{format}(o_i)]$

where $\hat{a}(o_i)$ is the extracted answer from the last \boxed{} expression, verified via symbolic equivalence.

Limitations

Scale: Nexus-1.5B operates at 1.54B parameters. Hard olympiad problems (e.g., AIME) remain challenging for models at this scale.
Language: Primarily optimized for English and Chinese mathematical text. Performance on other languages is not evaluated.
Domain: Designed for mathematical reasoning. General language understanding or instruction-following tasks are outside the model's training distribution.
TIR dependency: Tool-integrated reasoning requires a sandboxed Python interpreter at inference time.

Citation

If you use Nexus-1.5B in your research, please cite:

@techreport{neuriton2026nexus,
  title     = {Nexus-1.5B: Length-Penalized Reward Optimization for Robust Mathematical Reasoning},
  author    = {Neuriton Team},
  institution = {Neuriton},
  year      = {2026},
  month     = {Summer},
  note      = {Technical Report}
}

Acknowledgements

We thank the Qwen Team at Alibaba Group for open-sourcing the Qwen2.5-Math model family, and the authors of DAPO for the asymmetric clipping insight that is central to LPRO.

Developed by Neuriton · Summer 2026

Downloads last month: 46

Safetensors

Model size

2B params

Tensor type

F16

Model tree for Dat1710/nexus-1.5b

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-Math-1.5B

Finetuned

Qwen/Qwen2.5-Math-1.5B-Instruct

Finetuned

(91)

this model

Quantizations

2 models