Video understanding
updated
Wolf: Captioning Everything with a World Summarization Framework
Paper
•
2407.18908
•
Published
•
32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper
•
2407.19985
•
Published
•
37
TPDiff: Temporal Pyramid Video Diffusion Model
Paper
•
2503.09566
•
Published
•
45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware
Regressive GRPO
Paper
•
2506.07464
•
Published
•
14
Video models are zero-shot learners and reasoners
Paper
•
2509.20328
•
Published
•
100
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large
Multimodal Models
Paper
•
2510.05034
•
Published
•
50
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal
Evidence
Paper
•
2510.20579
•
Published
•
56
Video Reasoning without Training
Paper
•
2510.17045
•
Published
•
8
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement
Learning
Paper
•
2510.23473
•
Published
•
85
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with
the MME-CoF Benchmark
Paper
•
2510.26802
•
Published
•
34
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Paper
•
2511.15065
•
Published
•
77
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
Paper
•
2511.16668
•
Published
•
55
In-Video Instructions: Visual Signals as Generative Control
Paper
•
2511.19401
•
Published
•
32
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
Paper
•
2512.01342
•
Published
•
18
ViDiC: Video Difference Captioning
Paper
•
2512.03405
•
Published
•
28
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Paper
•
2512.04678
•
Published
•
41
Evaluating Gemini Robotics Policies in a Veo World Simulator
Paper
•
2512.10675
•
Published
•
19
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Paper
•
2512.13874
•
Published
•
17
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
Paper
•
2512.15702
•
Published
•
15
Kling-Omni Technical Report
Paper
•
2512.16776
•
Published
•
169
SemanticGen: Video Generation in Semantic Space
Paper
•
2512.20619
•
Published
•
93
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
•
2512.20618
•
Published
•
54
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Paper
•
2512.21004
•
Published
•
13
Inference-time Physics Alignment of Video Generative Models with Latent World Models
Paper
•
2601.10553
•
Published
•
12
Rethinking Video Generation Model for the Embodied World
Paper
•
2601.15282
•
Published
•
42
Self-Refining Video Sampling
Paper
•
2601.18577
•
Published
•
22