MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 301
Lizard: An Efficient Linearization Framework for Large Language Models Paper • 2507.09025 • Published Jul 11 • 18
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective Paper • 2507.23632 • Published Jul 31 • 6
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights Paper • 2510.04800 • Published Oct 6 • 36
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5 • 23
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models Paper • 2510.03561 • Published Oct 3 • 24
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22 • 114
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 114
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published 11 days ago • 26