Trained Verifier Models

Aletheia-Bench 's Collections

updated 5 days ago

Surrogate code verifiers across three model sizes trained using multiple different algorithms as described in the Aletheia paper

Upvote

Aletheia-Bench/GRPO-Think-1.5B-16k

Text Generation • 2B • Updated Oct 30, 2025 • 25

Note Our flagship verifiers, trained using a GRPO-style algorithm with a 16k generation limit
Aletheia-Bench/GRPO-Think-7B-16k

Text Generation • 8B • Updated Oct 30, 2025 • 33

Note Our flagship verifiers, trained using a GRPO-style algorithm with a 16k generation limit
Aletheia-Bench/GRPO-Think-14B-16k

Text Generation • 15B • Updated Nov 3, 2025 • 58

Note Our flagship verifiers, trained using a GRPO-style algorithm with a 16k generation limit
Aletheia-Bench/GRPO-Think-1.5B-4k

Text Generation • 2B • Updated Dec 2, 2025 • 24

Note A variant of the flagship verifiers trained with a 4k generation limit
Aletheia-Bench/GRPO-Think-7B-4k

Text Generation • 8B • Updated Dec 4, 2025 • 25

Note A variant of the flagship verifiers trained with a 4k generation limit
Aletheia-Bench/GRPO-Think-14B-4k

Text Generation • 15B • Updated Dec 8, 2025 • 25

Note A variant of the flagship verifiers trained with a 4k generation limit
Aletheia-Bench/GRPO-Think-1.5B-8k

Text Generation • 2B • Updated Dec 5, 2025 • 23

Note A variant of our flagship verifiers, trained with a 8k generation limit
Aletheia-Bench/GRPO-Think-7B-8k

Text Generation • 8B • Updated Dec 12, 2025 • 26

Note A variant of our flagship verifiers, trained with a 8k generation limit
Aletheia-Bench/GRPO-Think-14B-8k

Text Generation • 15B • Updated 6 days ago • 35 • 1

Note A variant of our flagship verifiers, trained with a 8k generation limit
Aletheia-Bench/GRPO-Instruct-1.5B

Text Generation • 2B • Updated Nov 27, 2025 • 1

Note A GRPO training ablation that does not generate thinking traces
Aletheia-Bench/GRPO-Instruct-7B

Text Generation • 8B • Updated Nov 26, 2025 • 6

Note A GRPO training ablation that does not generate thinking traces
Aletheia-Bench/GRPO-Instruct-14B

Text Generation • 15B • Updated Nov 30, 2025 • 1

Note A GRPO training ablation that does not generate thinking traces
Aletheia-Bench/DPO-Think-1.5B

Text Generation • 2B • Updated 6 days ago • 15

Note A verifier trained completely offline using DPO
Aletheia-Bench/DPO-Think-7B

Text Generation • 8B • Updated Nov 9, 2025 • 13

Note A verifier trained completely offline using DPO
Aletheia-Bench/DPO-Think-14B

Text Generation • 15B • Updated 6 days ago • 9 • 1

Note A verifier trained completely offline using DPO
Aletheia-Bench/BatchOnline-GRPO-1.5B

Text Generation • 2B • Updated 6 days ago • 6

Note A variant of our flagship verifiers, where the generation policy is synced every 4 gradient updates
Aletheia-Bench/BatchOnline-GRPO-7B

Text Generation • 8B • Updated 6 days ago • 6 • 1

Note A variant of our flagship verifiers, where the generation policy is synced every 4 gradient updates
Aletheia-Bench/BatchOnline-GRPO-14B

Text Generation • 15B • Updated 6 days ago • 44 • 1

Note A variant of our flagship verifiers, where the generation policy is synced every 4 gradient updates
Aletheia-Bench/RAFT-1.5B

2B • Updated Nov 24, 2025 • 60

Note A verifier trained using on-policy rejection sampling on only positive samples
Aletheia-Bench/RAFT-7B

8B • Updated Dec 7, 2025 • 11

Note A verifier trained using on-policy rejection sampling on only positive samples
Aletheia-Bench/RAFT-14B

15B • Updated Dec 1, 2025 • 3

Note A verifier trained using on-policy rejection sampling on only positive samples

Upvote