Remember, Retrieve and Generate: Understanding Infinite Visual Concepts
as Your Personalized Assistant
Paper
• 2410.13360
• Published
• 9
Note 值得关注
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper
• 2411.18203
• Published
• 40
Towards Interpreting Visual Information Processing in Vision-Language
Models
Paper
• 2410.07149
• Published
• 1
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Paper
• 2407.02477
• Published
• 24
Enhancing Instruction-Following Capability of Visual-Language Models by
Reducing Image Redundancy
Paper
• 2411.15453
• Published
Large Multi-modal Models Can Interpret Features in Large Multi-modal
Models
Paper
• 2411.14982
• Published
• 19
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Paper
• 2412.06676
• Published
• 9
Note 还行
From Uncertainty to Trust: Enhancing Reliability in Vision-Language
Models with Uncertainty-Guided Dropout Decoding
Paper
• 2412.06474
• Published
Note 不好说
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary
Embedding Distillation
Paper
• 2412.09585
• Published
• 11
Note 值得关注
SynerGen-VL: Towards Synergistic Image Understanding and Generation with
Vision Experts and Token Folding
Paper
• 2412.09604
• Published
• 38
Note 还行
Analyzing The Language of Visual Tokens
Paper
• 2411.05001
• Published
• 24
Note 值得关注
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via
Hierarchical Window Transformer
Paper
• 2412.13871
• Published
• 18
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper
• 2412.13303
• Published
• 75
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Paper
• 2412.18619
• Published
• 60
Note 持续关注
Task Preference Optimization: Improving Multimodal Large Language Models
with Vision Task Alignment
Paper
• 2412.19326
• Published
• 18
Explanatory Instructions: Towards Unified Vision Tasks Understanding and
Zero-shot Generalization
Paper
• 2412.18525
• Published
• 74
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
• 2501.01904
• Published
• 33