Generalization in Monitored Markov Decision Processes (Mon-MDPs) Paper • 2505.08988 • Published May 13, 2025
Sotopia-RL: Reward Design for Social Intelligence Paper • 2508.03905 • Published Aug 5, 2025 • 23
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents Paper • 2507.03112 • Published Jul 3, 2025 • 34
LIFELONG SOTOPIA: Evaluating Social Intelligence of Language Agents Over Lifelong Social Interactions Paper • 2506.12666 • Published Jun 14, 2025
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning Paper • 2507.14295 • Published Jul 18, 2025 • 14
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains Paper • 2507.17746 • Published Jul 23, 2025 • 5
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
Deep reinforcement learning from human preferences Paper • 1706.03741 • Published Jun 12, 2017 • 4
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published Feb 26, 2025 • 23
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences Paper • 2410.02381 • Published Oct 3, 2024 • 1
OpenAgents: An Open Platform for Language Agents in the Wild Paper • 2310.10634 • Published Oct 16, 2023 • 9
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals Paper • 2406.04784 • Published Jun 7, 2024 • 2
Aviary: training language agents on challenging scientific tasks Paper • 2412.21154 • Published Dec 30, 2024
One STEP at a time: Language Agents are Stepwise Planners Paper • 2411.08432 • Published Nov 13, 2024
Language agents achieve superhuman synthesis of scientific knowledge Paper • 2409.13740 • Published Sep 10, 2024
ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy Paper • 2403.14589 • Published Mar 21, 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents Paper • 2402.10196 • Published Feb 15, 2024
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game Paper • 2310.18940 • Published Oct 29, 2023
Reinforcing Language Agents via Policy Optimization with Action Decomposition Paper • 2405.15821 • Published May 23, 2024
Reflexion: Language Agents with Verbal Reinforcement Learning Paper • 2303.11366 • Published Mar 20, 2023 • 6
Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control Paper • 2510.14388 • Published Oct 16, 2025
Emergent Social Intelligence Risks in Generative Multi-Agent Systems Paper • 2603.27771 • Published 6 days ago • 50
SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers Paper • 2602.05115 • Published Feb 4 • 20
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios Paper • 2410.19346 • Published Oct 25, 2024
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence Paper • 2505.24500 • Published May 30, 2025 • 12
Digital Life Project: Autonomous 3D Characters with Social Intelligence Paper • 2312.04547 • Published Dec 7, 2023
One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social Intelligence Paper • 2602.03109 • Published Feb 3
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems Paper • 2505.18943 • Published May 25, 2025 • 25
SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents Paper • 2403.08715 • Published Mar 13, 2024 • 21
Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective Paper • 2504.10839 • Published Apr 15, 2025
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published 18 days ago • 306
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published Mar 4 • 210
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published 19 days ago • 184
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence Paper • 2603.13398 • Published 24 days ago • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published 19 days ago • 148
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 18 days ago • 136
Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning Paper • 1810.03043 • Published Oct 6, 2018