UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs Paper • 2512.03383 • Published 4 days ago • 3
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published 10 days ago • 95
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models Paper • 2511.18890 • Published 13 days ago • 29
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models Paper • 2511.23319 • Published 9 days ago • 21
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization Paper • 2512.00956 • Published 7 days ago • 17
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning Paper • 2512.02551 • Published 5 days ago • 11
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 6 days ago • 77
LightRAG: Simple and Fast Retrieval-Augmented Generation Paper • 2410.05779 • Published Oct 8, 2024 • 22
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 136