CAMV
updated
CoRAG: Collaborative Retrieval-Augmented Generation
Paper
• 2504.01883
• Published
• 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
• 2504.08837
• Published
• 43
Mavors: Multi-granularity Video Representation for Multimodal Large
Language Model
Paper
• 2504.10068
• Published
• 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published
• 85
Efficient Generative Model Training via Embedded Representation Warmup
Paper
• 2504.10188
• Published
• 12
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
• 2504.11343
• Published
• 19
NormalCrafter: Learning Temporally Consistent Normals from Video
Diffusion Priors
Paper
• 2504.11427
• Published
• 19
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published
• 63
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
• 2504.09454
• Published
• 11
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published
• 55
Generate, but Verify: Reducing Hallucination in Vision-Language Models
with Retrospective Resampling
Paper
• 2504.13169
• Published
• 39
DMM: Building a Versatile Image Generation Model via Distillation-Based
Model Merging
Paper
• 2504.12364
• Published
• 22
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
• 2504.09643
• Published
• 34
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper
• 2504.11393
• Published
• 18
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Paper
• 2504.09566
• Published
• 11
AlayaDB: The Data Foundation for Efficient and Effective Long-context
LLM Inference
Paper
• 2504.10326
• Published
• 25
InstantCharacter: Personalize Any Characters with a Scalable Diffusion
Transformer Framework
Paper
• 2504.12395
• Published
• 16
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Paper
• 2504.13055
• Published
• 19
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper
• 2504.10449
• Published
• 15
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper
• 2504.05303
• Published
• 5
IAAO: Interactive Affordance Learning for Articulated Objects in 3D
Environments
Paper
• 2504.06827
• Published
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls
for Video Generation
Paper
• 2504.14899
• Published
• 20
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through
the Lens of Internal Representations
Paper
• 2504.13816
• Published
• 18
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
Paper
• 2504.13203
• Published
• 35
MIG: Automatic Data Selection for Instruction Tuning by Maximizing
Information Gain in Semantic Space
Paper
• 2504.13835
• Published
• 38
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient
Training of Code LLMs
Paper
• 2504.14655
• Published
• 21
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
• 2504.14870
• Published
• 35
FlowReasoner: Reinforcing Query-Level Meta-Agents
Paper
• 2504.15257
• Published
• 47
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Paper
• 2504.15521
• Published
• 64
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
• 2504.16078
• Published
• 21
Vidi: Large Multimodal Models for Video Understanding and Editing
Paper
• 2504.15681
• Published
• 14
From Reflection to Perfection: Scaling Inference-Time Optimization for
Text-to-Image Diffusion Models via Reflection Tuning
Paper
• 2504.16080
• Published
• 15
Pre-DPO: Improving Data Utilization in Direct Preference Optimization
Using a Guiding Reference Model
Paper
• 2504.15843
• Published
• 16
I-Con: A Unifying Framework for Representation Learning
Paper
• 2504.16929
• Published
• 30
Tina: Tiny Reasoning Models via LoRA
Paper
• 2504.15777
• Published
• 56
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
• 2504.15279
• Published
• 78
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery
Simulation
Paper
• 2504.17207
• Published
• 30
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image
Generation
Paper
• 2504.17502
• Published
• 55
Token-Shuffle: Towards High-Resolution Image Generation with
Autoregressive Models
Paper
• 2504.17789
• Published
• 23
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Paper
• 2504.17768
• Published
• 14
BitNet v2: Native 4-bit Activations with Hadamard Transformation for
1-bit LLMs
Paper
• 2504.18415
• Published
• 49
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large
Language Models
Paper
• 2504.15716
• Published
• 12
Can Large Language Models Help Multimodal Language Analysis? MMLA: A
Comprehensive Benchmark
Paper
• 2504.16427
• Published
• 18
RepText: Rendering Visual Text via Replicating
Paper
• 2504.19724
• Published
• 31
YoChameleon: Personalized Vision and Language Generation
Paper
• 2504.20998
• Published
• 12
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with
Diverse Modalities and Granularities
Paper
• 2504.20734
• Published
• 62
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
• 2504.21776
• Published
• 59
Sadeed: Advancing Arabic Diacritization Through Small Language Model
Paper
• 2504.21635
• Published
• 59
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language
Models in Math
Paper
• 2504.21233
• Published
• 49
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for
Scalable and Generalizable Robot Learning
Paper
• 2504.18904
• Published
• 9
DeepCritic: Deliberate Critique with Large Language Models
Paper
• 2505.00662
• Published
• 54
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level
and Token-level CoT
Paper
• 2505.00703
• Published
• 44
Self-Generated In-Context Examples Improve LLM Agents for Sequential
Decision-Making Tasks
Paper
• 2505.00234
• Published
• 26
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D
Physics Modeling for Complex Motion and Interaction
Paper
• 2504.21855
• Published
• 13
Improving Editability in Image Generation with Layer-wise Memory
Paper
• 2505.01079
• Published
• 29
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG
Evaluation Prompts
Paper
• 2504.21117
• Published
• 26
Ming-Lite-Uni: Advancements in Unified Architecture for Natural
Multimodal Interaction
Paper
• 2505.02471
• Published
• 15
Agentic Reasoning and Tool Integration for LLMs via Reinforcement
Learning
Paper
• 2505.01441
• Published
• 39
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive
Streaming Speech Synthesis
Paper
• 2505.02625
• Published
• 23
A Survey on Inference Engines for Large Language Models: Perspectives on
Optimization and Efficiency
Paper
• 2505.01658
• Published
• 39
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published
• 189
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM
Inference
Paper
• 2505.02922
• Published
• 28
Benchmarking LLMs' Swarm intelligence
Paper
• 2505.04364
• Published
• 20
StreamBridge: Turning Your Offline Video Large Language Model into a
Proactive Streaming Assistant
Paper
• 2505.05467
• Published
• 13
R&B: Domain Regrouping and Data Mixture Balancing for Efficient
Foundation Model Training
Paper
• 2505.00358
• Published
• 26
OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents
Paper
• 2505.03570
• Published
• 8
LLM-Independent Adaptive RAG: Let the Question Speak for Itself
Paper
• 2505.04253
• Published
• 14
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in
Large Language Models
Paper
• 2505.02847
• Published
• 29
X-Reasoner: Towards Generalizable Reasoning Across Modalities and
Domains
Paper
• 2505.03981
• Published
• 15
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM
Reasoners With Verifiers
Paper
• 2505.04842
• Published
• 12
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision
Encoders for Multimodal Learning
Paper
• 2505.04601
• Published
• 29
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene
Generation
Paper
• 2504.21650
• Published
• 16
PrimeIntellect/INTELLECT-2
33B • Updated
• 29
• 205
Unified Continuous Generative Models
Paper
• 2505.07447
• Published
• 42
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable
Speaker Encoder
Paper
• 2505.07916
• Published
• 134
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for
Image Analysis
Paper
• 2505.09358
• Published
• 27
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal
Mathematical Reasoning
Paper
• 2505.10557
• Published
• 47
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper
• 2505.10320
• Published
• 24
OpenThinkIMG: Learning to Think with Images via Visual Tool
Reinforcement Learning
Paper
• 2505.08617
• Published
• 42
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
• 2505.10554
• Published
• 120
Parallel Scaling Law for Language Models
Paper
• 2505.10475
• Published
• 83
Exploring the Deep Fusion of Large Language Models and Diffusion
Transformers for Text-to-Image Synthesis
Paper
• 2505.10046
• Published
• 9
MMLongBench: Benchmarking Long-Context Vision-Language Models
Effectively and Thoroughly
Paper
• 2505.10610
• Published
• 55
Simple Semi-supervised Knowledge Distillation from Vision-Language
Models via texttt{D}ual-texttt{H}ead
texttt{O}ptimization
Paper
• 2505.07675
• Published
• 21
AdaptThink: Reasoning Models Can Learn When to Think
Paper
• 2505.13417
• Published
• 83
Chain-of-Model Learning for Language Model
Paper
• 2505.11820
• Published
• 121
Thinkless: LLM Learns When to Think
Paper
• 2505.13379
• Published
• 50
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable
Step-Level Supervision
Paper
• 2505.13427
• Published
• 26
CPGD: Toward Stable Rule-based Reinforcement Learning for Language
Models
Paper
• 2505.12504
• Published
• 24
Improving Assembly Code Performance with Large Language Models via
Reinforcement Learning
Paper
• 2505.11480
• Published
• 8
Visual Agentic Reinforcement Fine-Tuning
Paper
• 2505.14246
• Published
• 32
Paper
• 2505.14674
• Published
• 37
MMaDA: Multimodal Large Diffusion Language Models
Paper
• 2505.15809
• Published
• 98
Emerging Properties in Unified Multimodal Pretraining
Paper
• 2505.14683
• Published
• 133
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Paper
• 2505.14357
• Published
• 27
Scaling Computer-Use Grounding via User Interface Decomposition and
Synthesis
Paper
• 2505.13227
• Published
• 45
Think Only When You Need with Large Hybrid-Reasoning Models
Paper
• 2505.14631
• Published
• 20
Diffusion vs. Autoregressive Language Models: A Text Embedding
Perspective
Paper
• 2505.15045
• Published
• 55
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data
Could Be Secretly Stolen!
Paper
• 2505.15656
• Published
• 15
RLVR-World: Training World Models with Reinforcement Learning
Paper
• 2505.13934
• Published
• 16
Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models
Paper
• 2505.14810
• Published
• 62
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel
Decoding
Paper
• 2505.16990
• Published
• 22
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop
System from Hypothesis to Verification
Paper
• 2505.16938
• Published
• 121
QuickVideo: Real-Time Long Video Understanding with System Algorithm
Co-Design
Paper
• 2505.16175
• Published
• 42
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal
Large Language Models
Paper
• 2505.17015
• Published
• 9
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
• 2505.15966
• Published
• 53
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
• 2505.17612
• Published
• 81
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
• 2505.18129
• Published
• 62
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published
• 88
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in
Reasoning Models
Paper
• 2505.17225
• Published
• 64
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
Paper
• 2505.19147
• Published
• 145
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
• 2505.19914
• Published
• 46
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System
Collaboration
Paper
• 2505.20256
• Published
• 19
Synthetic Data RL: Task Definition Is All You Need
Paper
• 2505.17063
• Published
• 11
Interleaved Reasoning for Large Language Models via Reinforcement
Learning
Paper
• 2505.19640
• Published
• 15
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Paper
• 2505.19297
• Published
• 84
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
• 2505.14146
• Published
• 19
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering
Workflow
Paper
• 2505.17399
• Published
• 14
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent
Systems
Paper
• 2505.18943
• Published
• 25
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering
Target Atoms
Paper
• 2505.20322
• Published
• 14
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via
Next-Detail Prediction
Paper
• 2505.21473
• Published
• 16
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in
Video Scenarios
Paper
• 2505.21333
• Published
• 38
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM
Reasoning
Paper
• 2505.17813
• Published
• 58
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient
Fine-Tuning
Paper
• 2505.20355
• Published
• 36
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied
Iterative Policy Optimization
Paper
• 2505.19000
• Published
• 42
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper
• 2505.22453
• Published
• 46
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual
Tool Selection
Paper
• 2505.20289
• Published
• 10
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic
Scientific Workflows
Paper
• 2505.19897
• Published
• 104
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale
Verified Dataset
Paper
• 2505.21297
• Published
• 29
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Paper
• 2505.23762
• Published
• 45
UniRL: Self-Improving Unified Multimodal Models via Supervised and
Reinforcement Learning
Paper
• 2505.23380
• Published
• 22
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial
Intelligence
Paper
• 2505.23747
• Published
• 69
Muddit: Liberating Generation Beyond Text-to-Image with a Unified
Discrete Diffusion Model
Paper
• 2505.23606
• Published
• 14
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV
Cache and Parallel Decoding
Paper
• 2505.22618
• Published
• 45
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
• 2505.24864
• Published
• 143
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Paper
• 2506.01049
• Published
• 38
More Thinking, Less Seeing? Assessing Amplified Hallucination in
Multimodal Reasoning Models
Paper
• 2505.21523
• Published
• 13
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper
• 2506.02096
• Published
• 52
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Paper
• 2506.03136
• Published
• 25
Visual Embodied Brain: Let Multimodal Large Language Models See, Think,
and Control in Spaces
Paper
• 2506.00123
• Published
• 35
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon
Embodied Tasks
Paper
• 2506.00411
• Published
• 31
DINGO: Constrained Inference for Diffusion LLMs
Paper
• 2505.23061
• Published
• 31
Incentivizing Reasoning for Advanced Instruction-Following of Large
Language Models
Paper
• 2506.01413
• Published
• 16
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for
Over-Reasoning Mitigation
Paper
• 2506.02397
• Published
• 36
ComposeAnything: Composite Object Priors for Text-to-Image Generation
Paper
• 2505.24086
• Published
• 5
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate
Video Diffusion Transformers
Paper
• 2506.03065
• Published
• 27
From Token to Action: State Machine Reasoning to Mitigate Overthinking
in Information Retrieval
Paper
• 2505.23059
• Published
• 13
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via
Diffusion Transformers
Paper
• 2505.21541
• Published
• 7
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning
Capabilities of VLMs
Paper
• 2505.24120
• Published
• 49
VideoREPA: Learning Physics for Video Generation through Relational
Alignment with Foundation Models
Paper
• 2505.23656
• Published
• 25
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning
Capabilities Through Evaluation Design
Paper
• 2506.04734
• Published
• 21
Image Editing As Programs with Diffusion Models
Paper
• 2506.04158
• Published
• 24
Search Arena: Analyzing Search-Augmented LLMs
Paper
• 2506.05334
• Published
• 18
Aligning Latent Spaces with Flow Priors
Paper
• 2506.05240
• Published
• 27
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports
From Scratch with Agentic Framework
Paper
• 2506.02454
• Published
• 7
FlexPainter: Flexible and Multi-View Consistent Texture Generation
Paper
• 2506.02620
• Published
• 14
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal
Contextual Fusion
Paper
• 2506.01111
• Published
• 31
Audio-Aware Large Language Models as Judges for Speaking Styles
Paper
• 2506.05984
• Published
• 15
Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot
Data
Paper
• 2506.04120
• Published
• 7
ConfQA: Answer Only If You Are Confident
Paper
• 2506.07309
• Published
• 10
Through the Valley: Path to Effective Long CoT Training for Small
Language Models
Paper
• 2506.07712
• Published
• 18
PartCrafter: Structured 3D Mesh Generation via Compositional Latent
Diffusion Transformers
Paper
• 2506.05573
• Published
• 82
Vision Transformers Don't Need Trained Registers
Paper
• 2506.08010
• Published
• 22
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video
Diffusion Models
Paper
• 2506.07177
• Published
• 23
Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural
Compressor
Paper
• 2506.07932
• Published
• 12
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper
• 2506.09790
• Published
• 53
SAFE: Multitask Failure Detection for Vision-Language-Action Models
Paper
• 2506.09937
• Published
• 9
Ming-Omni: A Unified Multimodal Model for Perception and Generation
Paper
• 2506.09344
• Published
• 31
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical
Reasoning
Paper
• 2506.09513
• Published
• 102
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper
• 2506.10974
• Published
• 19
Paper
• 2506.10910
• Published
• 66
Comment on The Illusion of Thinking: Understanding the Strengths and
Limitations of Reasoning Models via the Lens of Problem Complexity
Paper
• 2506.09250
• Published
• 27
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven
Clip Generation
Paper
• 2506.10540
• Published
• 37
Aligned Novel View Image and Geometry Synthesis via Cross-modal
Attention Instillation
Paper
• 2506.11924
• Published
• 34
Marrying Autoregressive Transformer and Diffusion with Multi-Reference
Autoregression
Paper
• 2506.09482
• Published
• 45
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes
Correct Reasoning in Base LLMs
Paper
• 2506.14245
• Published
• 45
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper
• 2506.06962
• Published
• 28
Scaling Test-time Compute for LLM Agents
Paper
• 2506.12928
• Published
• 63
Reasoning with Exploration: An Entropy Perspective
Paper
• 2506.14758
• Published
• 31
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark
for Financial LLM Evaluation
Paper
• 2506.14028
• Published
• 93
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in
LLMs
Paper
• 2506.19290
• Published
• 53
Chain-of-Experts: Unlocking the Communication Power of
Mixture-of-Experts Models
Paper
• 2506.18945
• Published
• 40
Learning to Skip the Middle Layers of Transformers
Paper
• 2506.21103
• Published
• 18