Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.21539

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3, 2025 • 25
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2, 2025 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1, 2025 • 23

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

Dynamic View Synthesis as an Inverse Problem

Paper • 2506.08004 • Published Jun 9, 2025 • 5
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Paper • 2506.17201 • Published Jun 20, 2025 • 57
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations

Paper • 2504.07830 • Published Apr 10, 2025 • 18
WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published Apr 16, 2025 • 35
Towards a Unified Copernicus Foundation Model for Earth Vision

Paper • 2503.11849 • Published Mar 14, 2025 • 4
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Paper • 2506.18903 • Published Jun 23, 2025 • 22

Continual Learning - VLA

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11, 2025 • 243
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2, 2025 • 10
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 228
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published Nov 12, 2025 • 18

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5, 2025 • 10
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1, 2025 • 65
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 50

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24, 2025 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 147
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

Paper • 2504.09081 • Published Apr 12, 2025 • 16
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17, 2025 • 19
Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18, 2025 • 65
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Continual Learning - VLA

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11, 2025 • 243
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2, 2025 • 10
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 228
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published Nov 12, 2025 • 18

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3, 2025 • 25
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2, 2025 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1, 2025 • 23

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5, 2025 • 10
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1, 2025 • 65
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 50

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24, 2025 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 147
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

Dynamic View Synthesis as an Inverse Problem

Paper • 2506.08004 • Published Jun 9, 2025 • 5
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Paper • 2506.17201 • Published Jun 20, 2025 • 57
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

Paper • 2504.09081 • Published Apr 12, 2025 • 16
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17, 2025 • 19
Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18, 2025 • 65
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26, 2025 • 40

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations

Paper • 2504.07830 • Published Apr 10, 2025 • 18
WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published Apr 16, 2025 • 35
Towards a Unified Copernicus Foundation Model for Earth Vision

Paper • 2503.11849 • Published Mar 14, 2025 • 4
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Paper • 2506.18903 • Published Jun 23, 2025 • 22

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs