RL - a zhuww Collection

zhuww 's Collections

RL

RL

updated Dec 16, 2025

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published Oct 1, 2025 • 59
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23, 2025 • 23
Learning to Reason as Action Abstractions with Scalable Mid-Training RL

Paper • 2509.25810 • Published Sep 30, 2025 • 6
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 273
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 188
OpenThoughts: Data Recipes for Reasoning Models

Paper • 2506.04178 • Published Jun 4, 2025 • 53
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published Jul 2, 2025 • 56
First Return, Entropy-Eliciting Explore

Paper • 2507.07017 • Published Jul 9, 2025 • 24
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

Paper • 2510.18855 • Published Oct 21, 2025 • 73
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation

Paper • 2509.26497 • Published Sep 30, 2025
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

Paper • 2510.22115 • Published Oct 25, 2025 • 85
Balanced Actor Initialization: Stable RLHF Training of Distillation-Based Reasoning Models

Paper • 2509.00309 • Published Aug 30, 2025
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation

Paper • 2509.18521 • Published Sep 23, 2025
Bridging Offline and Online Reinforcement Learning for LLMs

Paper • 2506.21495 • Published Jun 26, 2025 • 3
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published Dec 8, 2025 • 39
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 105
Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Paper • 2512.13607 • Published Dec 15, 2025 • 36