RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published 18 days ago • 24
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published Oct 27 • 22
High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting Paper • 2510.10637 • Published Oct 12 • 12
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Paper • 2509.15212 • Published Sep 18 • 21
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research Paper • 2509.13312 • Published Sep 16 • 105
Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors Paper • 2508.08896 • Published Aug 12 • 10
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 192
view article Article RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Aug 11 • 28
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Paper • 2505.04588 • Published May 7 • 65
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 303
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published Feb 13 • 28
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 90