-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2507.16815
-
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Paper • 2507.16815 • Published • 39 -
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Paper • 2507.16814 • Published • 21 -
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Paper • 2503.18013 • Published • 20 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43
-
Gaussian Splatting with Discretized SDF for Relightable Assets
Paper • 2507.15629 • Published • 23 -
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 122 -
Step-Audio 2 Technical Report
Paper • 2507.16632 • Published • 73 -
HOComp: Interaction-Aware Human-Object Composition
Paper • 2507.16813 • Published • 12
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 38 -
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
Paper • 2507.04447 • Published • 44 -
A Survey on Vision-Language-Action Models for Autonomous Driving
Paper • 2506.24044 • Published • 14 -
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments
Paper • 2507.10548 • Published • 36
-
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Paper • 2506.06205 • Published • 30 -
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper • 2506.07530 • Published • 20 -
Ark: An Open-source Python-based Framework for Robot Learning
Paper • 2506.21628 • Published • 16 -
RoboBrain 2.0 Technical Report
Paper • 2507.02029 • Published • 33
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 27 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 143 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 11 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Paper • 2507.16815 • Published • 39 -
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Paper • 2507.16814 • Published • 21 -
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Paper • 2503.18013 • Published • 20 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43
-
Gaussian Splatting with Discretized SDF for Relightable Assets
Paper • 2507.15629 • Published • 23 -
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 122 -
Step-Audio 2 Technical Report
Paper • 2507.16632 • Published • 73 -
HOComp: Interaction-Aware Human-Object Composition
Paper • 2507.16813 • Published • 12
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 38 -
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
Paper • 2507.04447 • Published • 44 -
A Survey on Vision-Language-Action Models for Autonomous Driving
Paper • 2506.24044 • Published • 14 -
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments
Paper • 2507.10548 • Published • 36
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 27 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 143 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 11 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
-
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Paper • 2506.06205 • Published • 30 -
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper • 2506.07530 • Published • 20 -
Ark: An Open-source Python-based Framework for Robot Learning
Paper • 2506.21628 • Published • 16 -
RoboBrain 2.0 Technical Report
Paper • 2507.02029 • Published • 33
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39