Collections
Discover the best community collections!
Collections including paper arxiv:2509.10147
-
AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving
Paper • 2506.12508 • Published • 1 -
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling
Paper • 2507.23370 • Published -
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
Paper • 2509.09734 • Published • 15 -
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Paper • 2508.01780 • Published • 20
-
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper • 2506.04180 • Published • 33 -
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Paper • 2506.10540 • Published • 37 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19 -
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search
Paper • 2507.15245 • Published • 11
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 54 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 39 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design
Paper • 2311.13743 • Published • 1 -
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
Paper • 2509.09995 • Published • 14 -
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 14 -
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Paper • 2509.09677 • Published • 34
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Paper • 2508.15760 • Published • 46 -
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Paper • 2508.01780 • Published • 20 -
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
Paper • 2304.08244 • Published • 1 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 158
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 62 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design
Paper • 2311.13743 • Published • 1 -
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
Paper • 2509.09995 • Published • 14 -
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 14 -
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Paper • 2509.09677 • Published • 34
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving
Paper • 2506.12508 • Published • 1 -
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling
Paper • 2507.23370 • Published -
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
Paper • 2509.09734 • Published • 15 -
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Paper • 2508.01780 • Published • 20
-
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Paper • 2508.15760 • Published • 46 -
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Paper • 2508.01780 • Published • 20 -
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
Paper • 2304.08244 • Published • 1 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 158
-
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper • 2506.04180 • Published • 33 -
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Paper • 2506.10540 • Published • 37 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19 -
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search
Paper • 2507.15245 • Published • 11
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 62 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 54 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 39 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29