MAPS: Preserving Vision-Language Representations via Module-Wise Proximity Scheduling for Better Vision-Language-Action Generalization Paper • 2511.19878 • Published 12 days ago • 1
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flow Paper • 2511.20462 • Published 11 days ago • 29
Stanford-ILIAD/prism-qwen25-extra-dinosiglip-224px-0_5b Image-Text-to-Text • Updated Dec 12, 2024 • 958 • 6