-
ExGRPO: Learning to Reason from Experience
Paper • 2510.02245 • Published • 80 -
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
Paper • 2510.01132 • Published • 5 -
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Paper • 2510.04618 • Published • 123 -
MixReasoning: Switching Modes to Think
Paper • 2510.06052 • Published • 21
Collections
Discover the best community collections!
Collections including paper arxiv:2510.02245
-
Less LLM, More Documents: Searching for Improved RAG
Paper • 2510.02657 • Published • 2 -
ExGRPO: Learning to Reason from Experience
Paper • 2510.02245 • Published • 80 -
A Definition of AGI
Paper • 2510.18212 • Published • 34 -
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Paper • 2511.08892 • Published • 194
-
ExGRPO: Learning to Reason from Experience
Paper • 2510.02245 • Published • 80 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 116 -
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Paper • 2508.19828 • Published • 7
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation
Paper • 2510.00880 • Published -
Position: Privacy Is Not Just Memorization!
Paper • 2510.01645 • Published • 1 -
Less LLM, More Documents: Searching for Improved RAG
Paper • 2510.02657 • Published • 2 -
ExGRPO: Learning to Reason from Experience
Paper • 2510.02245 • Published • 80
-
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 63 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Paper • 2411.03884 • Published • 28 -
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Paper • 2502.00698 • Published • 24
-
ExGRPO: Learning to Reason from Experience
Paper • 2510.02245 • Published • 80 -
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
Paper • 2510.01132 • Published • 5 -
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Paper • 2510.04618 • Published • 123 -
MixReasoning: Switching Modes to Think
Paper • 2510.06052 • Published • 21
-
Less LLM, More Documents: Searching for Improved RAG
Paper • 2510.02657 • Published • 2 -
ExGRPO: Learning to Reason from Experience
Paper • 2510.02245 • Published • 80 -
A Definition of AGI
Paper • 2510.18212 • Published • 34 -
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Paper • 2511.08892 • Published • 194
-
HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation
Paper • 2510.00880 • Published -
Position: Privacy Is Not Just Memorization!
Paper • 2510.01645 • Published • 1 -
Less LLM, More Documents: Searching for Improved RAG
Paper • 2510.02657 • Published • 2 -
ExGRPO: Learning to Reason from Experience
Paper • 2510.02245 • Published • 80
-
ExGRPO: Learning to Reason from Experience
Paper • 2510.02245 • Published • 80 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 116 -
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Paper • 2508.19828 • Published • 7
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 63 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Paper • 2411.03884 • Published • 28 -
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Paper • 2502.00698 • Published • 24
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63