-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 148 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
The Curse of Depth in Large Language Models
Paper • 2502.05795 • Published • 40
Collections
Discover the best community collections!
Collections including paper arxiv:2502.06703
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 53 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 33 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper • 2502.04416 • Published • 12 -
Competitive Programming with Large Reasoning Models
Paper • 2502.06807 • Published • 68 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Paper • 2507.21033 • Published • 20
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Scaling Laws for Precision
Paper • 2411.04330 • Published • 8 -
Transcending Scaling Laws with 0.1% Extra Compute
Paper • 2210.11399 • Published
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 123 -
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Paper • 2502.08946 • Published • 191 -
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Paper • 2502.07346 • Published • 54 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152
-
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Paper • 2403.06504 • Published • 56 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 69
-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 148 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
The Curse of Depth in Large Language Models
Paper • 2502.05795 • Published • 40
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Scaling Laws for Precision
Paper • 2411.04330 • Published • 8 -
Transcending Scaling Laws with 0.1% Extra Compute
Paper • 2210.11399 • Published
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 53 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 33 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 123 -
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Paper • 2502.08946 • Published • 191 -
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Paper • 2502.07346 • Published • 54 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152
-
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper • 2502.04416 • Published • 12 -
Competitive Programming with Large Reasoning Models
Paper • 2502.06807 • Published • 68 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Paper • 2507.21033 • Published • 20
-
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Paper • 2403.06504 • Published • 56 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 69