Collections
Discover the best community collections!
Collections including paper arxiv:2512.02395
-
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Paper • 2512.02395 • Published • 49 -
Skywork/R1V4
Image-Text-to-Text • Updated • 13 -
Skywork/Skywork-VL-Reward-7B
Image-Text-to-Text • 8B • Updated • 1.58k • 47 -
Mungert/Skywork-VL-Reward-7B-GGUF
Image-Text-to-Text • 8B • Updated • 131
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 133 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 141 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 65 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 160
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 258 -
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Paper • 2512.02395 • Published • 49 -
Deep Research: A Systematic Survey
Paper • 2512.02038 • Published • 72 -
How Far Are We from Genuinely Useful Deep Research Agents?
Paper • 2512.01948 • Published • 56
-
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
Paper • 2505.10597 • Published -
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Paper • 2504.05535 • Published • 44 -
nvidia/HelpSteer3
Viewer • Updated • 133k • 3.81k • 97 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 97 • 11
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 23 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 258 -
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Paper • 2512.02395 • Published • 49 -
Deep Research: A Systematic Survey
Paper • 2512.02038 • Published • 72 -
How Far Are We from Genuinely Useful Deep Research Agents?
Paper • 2512.01948 • Published • 56
-
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Paper • 2512.02395 • Published • 49 -
Skywork/R1V4
Image-Text-to-Text • Updated • 13 -
Skywork/Skywork-VL-Reward-7B
Image-Text-to-Text • 8B • Updated • 1.58k • 47 -
Mungert/Skywork-VL-Reward-7B-GGUF
Image-Text-to-Text • 8B • Updated • 131
-
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
Paper • 2505.10597 • Published -
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Paper • 2504.05535 • Published • 44 -
nvidia/HelpSteer3
Viewer • Updated • 133k • 3.81k • 97 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 97 • 11
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 133 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 141 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 65 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 160
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 23 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23