How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published 7 days ago • 50
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published 15 days ago • 244
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published 29 days ago • 50
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models Paper • 2310.10505 • Published Oct 16, 2023 • 1
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO Paper • 2505.11595 • Published May 16 • 1
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24 • 80
Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving Paper • 2508.09099 • Published Aug 12
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment Paper • 2505.04113 • Published May 7
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published Sep 30 • 47
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning Paper • 2510.20286 • Published Oct 23 • 23