The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11, 2025 • 33
The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published Oct 15, 2025 • 31
Diffusion Language Models Know the Answer Before Decoding Paper • 2508.19982 • Published Aug 27, 2025 • 25
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning Paper • 2506.00772 • Published Jun 1, 2025 • 2
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper • 2502.20545 • Published Feb 27, 2025 • 22
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published Feb 24, 2025 • 20
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More Paper • 2502.07490 • Published Feb 11, 2025 • 10
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published Jan 22, 2025 • 28
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published Jan 12, 2025 • 16
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published Dec 18, 2024 • 20
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11, 2024 • 33