ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published 10 days ago • 94
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models Paper • 2511.17487 • Published 15 days ago • 9
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 114
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30 • 115
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models Paper • 2510.08559 • Published Oct 9 • 8
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 192
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 143
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 52
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30 • 49
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper • 2504.17502 • Published Apr 24 • 55
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 63
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering Paper • 2504.05506 • Published Apr 7 • 25
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 51