Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper • 2511.08577 • Published Nov 11, 2025 • 108
When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents? Paper • 2510.17862 • Published Oct 15, 2025 • 7
When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents? Paper • 2510.17862 • Published Oct 15, 2025 • 7 • 2
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs? Paper • 2510.01161 • Published Oct 1, 2025 • 14
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs? Paper • 2510.01161 • Published Oct 1, 2025 • 14 • 2
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11, 2025 • 55
Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale Paper • 2502.01798 • Published Feb 3, 2025 • 1
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Paper • 2406.09961 • Published Jun 14, 2024 • 55