M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page
Multi-document Understanding
Paper
• 2411.04952
• Published
• 29
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion
Models
Paper
• 2411.05005
• Published
• 13
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for
Evaluating Foundation Models
Paper
• 2411.04075
• Published
• 16
Self-Consistency Preference Optimization
Paper
• 2411.04109
• Published
• 19
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge
in RAG Systems
Paper
• 2411.02959
• Published
• 71
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published
• 100
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published
• 55
Evaluating Sample Utility for Data Selection by Mimicking Model Weights
Paper
• 2501.06708
• Published
• 5
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published
• 300
3DIS-FLUX: simple and efficient multi-instance generation with DiT
rendering
Paper
• 2501.05131
• Published
• 37
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token
Marks
Paper
• 2501.08326
• Published
• 34
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
Paper
• 2501.08292
• Published
• 17
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration
Paper
• 2502.01068
• Published
• 18
Improving Transformer World Models for Data-Efficient RL
Paper
• 2502.01591
• Published
• 9
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
• 2501.19324
• Published
• 39
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing
Guardrail Moderation
Paper
• 2501.17433
• Published
• 10
AnimeGamer: Infinite Anime Life Simulation with Next Game State
Prediction
Paper
• 2504.01014
• Published
• 70
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in
One Step
Paper
• 2504.01956
• Published
• 41
Towards Physically Plausible Video Generation via VLM Planning
Paper
• 2503.23368
• Published
• 40
VisualCloze: A Universal Image Generation Framework via Visual
In-Context Learning
Paper
• 2504.07960
• Published
• 50
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
• 2504.08736
• Published
• 46
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published
• 63
AlayaDB: The Data Foundation for Efficient and Effective Long-context
LLM Inference
Paper
• 2504.10326
• Published
• 25
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion
Transformers
Paper
• 2504.10483
• Published
• 22
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Paper
• 2504.09566
• Published
• 11
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
Language Model Pre-training
Paper
• 2504.13161
• Published
• 93
Generate, but Verify: Reducing Hallucination in Vision-Language Models
with Retrospective Resampling
Paper
• 2504.13169
• Published
• 39
InstantCharacter: Personalize Any Characters with a Scalable Diffusion
Transformer Framework
Paper
• 2504.12395
• Published
• 16
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU
Inference via Dynamic-Length Float
Paper
• 2504.11651
• Published
• 31
Complex-Edit: CoT-Like Instruction Generation for
Complexity-Controllable Image Editing Benchmark
Paper
• 2504.13143
• Published
• 7
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through
Task Tokenization
Paper
• 2503.19901
• Published
• 41
Self-Supervised Learning of Motion Concepts by Optimizing
Counterfactuals
Paper
• 2503.19953
• Published
• 3
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
• 2504.20571
• Published
• 98
ReasonIR: Training Retrievers for Reasoning Tasks
Paper
• 2504.20595
• Published
• 54
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement
Fine-Tuning
Paper
• 2505.03318
• Published
• 92
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
Paper
• 2505.19147
• Published
• 145
ARM: Adaptive Reasoning Model
Paper
• 2505.20258
• Published
• 45
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
• 2505.19914
• Published
• 46
Paper
• 2505.19752
• Published
• 16
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs
Paper
• 2506.15211
• Published
• 39
SwarmAgentic: Towards Fully Automated Agentic System Generation via
Swarm Intelligence
Paper
• 2506.15672
• Published
• 15
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal
Large Language Models
Paper
• 2506.14824
• Published
• 8
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement
Learning
Paper
• 2506.18841
• Published
• 56
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published
• 29
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
• 2506.18254
• Published
• 32
TC-Light: Temporally Consistent Relighting for Dynamic Long Videos
Paper
• 2506.18904
• Published
• 10
FaithfulSAE: Towards Capturing Faithful Features with Sparse
Autoencoders without External Dataset Dependencies
Paper
• 2506.17673
• Published
• 7
MMSearch-R1: Incentivizing LMMs to Search
Paper
• 2506.20670
• Published
• 64
FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient
Multi-turn Image Editing
Paper
• 2506.20911
• Published
• 41
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Paper
• 2506.21506
• Published
• 52
Scaling RL to Long Videos
Paper
• 2507.07966
• Published
• 160
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
LongCodeZip: Compress Long Context for Code Language Models
Paper
• 2510.00446
• Published
• 107
Interactive Training: Feedback-Driven Neural Network Optimization
Paper
• 2510.02297
• Published
• 43
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published
• 309
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
Paper
• 2601.10477
• Published
• 155
Transformers without Normalization
Paper
• 2503.10622
• Published
• 170
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 317
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper
• 2511.22699
• Published
• 238
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper
• 2511.18538
• Published
• 298