The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling Paper • 2508.16745 • Published Aug 22, 2025 • 29
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6, 2025 • 129
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5, 2025 • 122
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent Paper • 2312.10003 • Published Dec 15, 2023 • 44
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 Jun 3, 2025 • 96
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset Mar 11, 2025 • 103
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 876
view article Article Orchestration of Experts: The First-Principle Multi-Model System May 30, 2024 • 15
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 44
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56
SaulLM-7B: A pioneering Large Language Model for Law Paper • 2403.03883 • Published Mar 6, 2024 • 89
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21, 2024 • 116
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6, 2024 • 117