33 283 47

Orr Zohar PRO

orrzohar

https://ai.stanford.edu/~orrzohar/

AI & ML interests

Large Multi-Modal Models, Foundation Models, Video Understanding

Recent Activity

updated a dataset about 12 hours ago

orrzohar/BLIP3o-Visual-Reasoning

published a dataset about 12 hours ago

orrzohar/BLIP3o-Visual-Reasoning

upvoted a paper 3 days ago

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

View all activity

Organizations

upvoted a paper 3 days ago

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published 10 days ago • 94

upvoted a paper 11 days ago

Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models

Paper • 2511.17487 • Published 15 days ago • 9

upvoted 2 papers about 1 month ago

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30 • 114

The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published Oct 30 • 115

upvoted 4 papers about 2 months ago

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

Paper • 2510.08559 • Published Oct 9 • 8

upvoted a paper 4 months ago

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 192

upvoted an article 5 months ago

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Jul 23

•

upvoted a paper 6 months ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 143

upvoted 3 papers 7 months ago

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Paper • 2505.14231 • Published May 20 • 52

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30 • 49

upvoted 6 papers 8 months ago

RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation

Paper • 2504.17502 • Published Apr 24 • 55

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63

FlowReasoner: Reinforcing Query-Level Meta-Agents

Paper • 2504.15257 • Published Apr 21 • 47

Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88

ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering

Paper • 2504.05506 • Published Apr 7 • 25

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51

Orr Zohar PRO

AI & ML interests

Recent Activity

Organizations

orrzohar's activity

TimeScope: How Long Can Your Video Large Multimodal Model Go?