ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents Paper • 2601.12294 • Published 7 days ago • 16
MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models Paper • 2505.16700 • Published May 22, 2025 • 1
ServiceNow-AI/Apriel-Nemotron-15b-Thinker Text Generation • 15B • Updated Nov 10, 2025 • 1.49k • 124
MIRage Collection Official model collection of paper: Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models • 2 items • Updated Feb 4, 2025 • 1