Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
• 2510.00938
• Published
• 59
What Characterizes Effective Reasoning? Revisiting Length, Review, and
Structure of CoT
Paper
• 2509.19284
• Published
• 23
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper
• 2509.25810
• Published
• 6
Agent Learning via Early Experience
Paper
• 2510.08558
• Published
• 273
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
• 2506.01939
• Published
• 188
OpenThoughts: Data Recipes for Reasoning Models
Paper
• 2506.04178
• Published
• 53
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Paper
• 2507.01352
• Published
• 56
First Return, Entropy-Eliciting Explore
Paper
• 2507.07017
• Published
• 24
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published
• 50
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale
Thinking Model
Paper
• 2510.18855
• Published
• 73
Revealing the Power of Post-Training for Small Language Models via
Knowledge Distillation
Paper
• 2509.26497
• Published
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open
Language Foundation
Paper
• 2510.22115
• Published
• 85
Balanced Actor Initialization: Stable RLHF Training of
Distillation-Based Reasoning Models
Paper
• 2509.00309
• Published
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame
Long-tail Generation
Paper
• 2509.18521
• Published
Bridging Offline and Online Reinforcement Learning for LLMs
Paper
• 2506.21495
• Published
• 3
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Paper
• 2512.07783
• Published
• 39
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published
• 105
Paper
• 2505.14674
• Published
• 37
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper
• 2512.13607
• Published
• 36