Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.02245

Learning from examples - training/inference

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

Paper • 2510.01132 • Published Oct 1 • 5
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6 • 123
MixReasoning: Switching Modes to Think

Paper • 2510.06052 • Published Oct 7 • 21

data voor ai minecraft

Less LLM, More Documents: Searching for Improved RAG

Paper • 2510.02657 • Published Oct 3 • 2
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80
A Definition of AGI

Paper • 2510.18212 • Published Oct 21 • 34
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published 25 days ago • 194

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 98
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Paper • 2508.19828 • Published Aug 27 • 7

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 70
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 22
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

Rethinking Entropy Regularization in Large Reasoning Models

Paper • 2509.25133 • Published Sep 29 • 4
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8 • 30

HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

Paper • 2510.00880 • Published Oct 1
Position: Privacy Is Not Just Memorization!

Paper • 2510.01645 • Published Oct 2 • 1
Less LLM, More Documents: Searching for Improved RAG

Paper • 2510.02657 • Published Oct 3 • 2
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80

Model collections trained using ExGRPO.

rzzhan/ExGRPO-Qwen2.5-Math-7B-Zero

Text Generation • 8B • Updated Oct 24 • 20
rzzhan/ExGRPO-LUFFY-7B-Continual

Text Generation • 8B • Updated Oct 24 • 17
rzzhan/ExGRPO-Qwen2.5-7B-Instruct

Text Generation • 8B • Updated Oct 24 • 14
rzzhan/ExGRPO-Qwen2.5-Math-1.5B-Zero

Text Generation • 2B • Updated Oct 24 • 10

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 63
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Paper • 2411.03884 • Published Nov 6, 2024 • 28
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published Feb 2 • 24

Learning from examples - training/inference

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

Paper • 2510.01132 • Published Oct 1 • 5
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6 • 123
MixReasoning: Switching Modes to Think

Paper • 2510.06052 • Published Oct 7 • 21

Rethinking Entropy Regularization in Large Reasoning Models

Paper • 2509.25133 • Published Sep 29 • 4
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8 • 30

data voor ai minecraft

Less LLM, More Documents: Searching for Improved RAG

Paper • 2510.02657 • Published Oct 3 • 2
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80
A Definition of AGI

Paper • 2510.18212 • Published Oct 21 • 34
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published 25 days ago • 194

HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

Paper • 2510.00880 • Published Oct 1
Position: Privacy Is Not Just Memorization!

Paper • 2510.01645 • Published Oct 2 • 1
Less LLM, More Documents: Searching for Improved RAG

Paper • 2510.02657 • Published Oct 3 • 2
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 98
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Paper • 2508.19828 • Published Aug 27 • 7

Model collections trained using ExGRPO.

rzzhan/ExGRPO-Qwen2.5-Math-7B-Zero

Text Generation • 8B • Updated Oct 24 • 20
rzzhan/ExGRPO-LUFFY-7B-Continual

Text Generation • 8B • Updated Oct 24 • 17
rzzhan/ExGRPO-Qwen2.5-7B-Instruct

Text Generation • 8B • Updated Oct 24 • 14
rzzhan/ExGRPO-Qwen2.5-Math-1.5B-Zero

Text Generation • 2B • Updated Oct 24 • 10

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 70
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 22
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 63
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Paper • 2411.03884 • Published Nov 6, 2024 • 28
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published Feb 2 • 24

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs