Trained Verifier Models
Surrogate code verifiers across three model sizes trained using multiple different algorithms as described in the Aletheia paper
Text Generation • 2B • Updated • 25Note Our flagship verifiers, trained using a GRPO-style algorithm with a 16k generation limit
Aletheia-Bench/GRPO-Think-7B-16k
Text Generation • 8B • Updated • 33Note Our flagship verifiers, trained using a GRPO-style algorithm with a 16k generation limit
Aletheia-Bench/GRPO-Think-14B-16k
Text Generation • 15B • Updated • 58Note Our flagship verifiers, trained using a GRPO-style algorithm with a 16k generation limit
Aletheia-Bench/GRPO-Think-1.5B-4k
Text Generation • 2B • Updated • 24Note A variant of the flagship verifiers trained with a 4k generation limit
Aletheia-Bench/GRPO-Think-7B-4k
Text Generation • 8B • Updated • 25Note A variant of the flagship verifiers trained with a 4k generation limit
Aletheia-Bench/GRPO-Think-14B-4k
Text Generation • 15B • Updated • 25Note A variant of the flagship verifiers trained with a 4k generation limit
Aletheia-Bench/GRPO-Think-1.5B-8k
Text Generation • 2B • Updated • 23Note A variant of our flagship verifiers, trained with a 8k generation limit
Aletheia-Bench/GRPO-Think-7B-8k
Text Generation • 8B • Updated • 26Note A variant of our flagship verifiers, trained with a 8k generation limit
Aletheia-Bench/GRPO-Think-14B-8k
Text Generation • 15B • Updated • 35 • 1Note A variant of our flagship verifiers, trained with a 8k generation limit
Aletheia-Bench/GRPO-Instruct-1.5B
Text Generation • 2B • Updated • 1Note A GRPO training ablation that does not generate thinking traces
Aletheia-Bench/GRPO-Instruct-7B
Text Generation • 8B • Updated • 6Note A GRPO training ablation that does not generate thinking traces
Aletheia-Bench/GRPO-Instruct-14B
Text Generation • 15B • Updated • 1Note A GRPO training ablation that does not generate thinking traces
Aletheia-Bench/DPO-Think-1.5B
Text Generation • 2B • Updated • 15Note A verifier trained completely offline using DPO
Aletheia-Bench/DPO-Think-7B
Text Generation • 8B • Updated • 13Note A verifier trained completely offline using DPO
Aletheia-Bench/DPO-Think-14B
Text Generation • 15B • Updated • 9 • 1Note A verifier trained completely offline using DPO
Aletheia-Bench/BatchOnline-GRPO-1.5B
Text Generation • 2B • Updated • 6Note A variant of our flagship verifiers, where the generation policy is synced every 4 gradient updates
Aletheia-Bench/BatchOnline-GRPO-7B
Text Generation • 8B • Updated • 6 • 1Note A variant of our flagship verifiers, where the generation policy is synced every 4 gradient updates
Aletheia-Bench/BatchOnline-GRPO-14B
Text Generation • 15B • Updated • 44 • 1Note A variant of our flagship verifiers, where the generation policy is synced every 4 gradient updates
Aletheia-Bench/RAFT-1.5B
2B • Updated • 60Note A verifier trained using on-policy rejection sampling on only positive samples
Aletheia-Bench/RAFT-7B
8B • Updated • 11Note A verifier trained using on-policy rejection sampling on only positive samples
Aletheia-Bench/RAFT-14B
15B • Updated • 3Note A verifier trained using on-policy rejection sampling on only positive samples