Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Paper • 2511.12609 • Published 21 days ago • 102
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models Paper • 2508.09834 • Published Aug 13 • 53
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Paper • 2507.08771 • Published Jul 11 • 9
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs Paper • 2504.17768 • Published Apr 24 • 13
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published Mar 10 • 68
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published Nov 4, 2024 • 11
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published Sep 4, 2024 • 31
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22, 2024 • 14
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10, 2024 • 27
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published Jun 10, 2024 • 38
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 872
In deep reinforcement learning, a pruned network is a good network Paper • 2402.12479 • Published Feb 19, 2024 • 19
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21, 2024 • 116
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26, 2024 • 74
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11, 2024 • 58