-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 148 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65
Collections
Discover the best community collections!
Collections including paper arxiv:2312.00752
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 46 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 20 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 148 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62 -
Vivim: a Video Vision Mamba for Medical Video Object Segmentation
Paper • 2401.14168 • Published • 2 -
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Paper • 2008.07669 • Published • 1
-
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper • 2402.05140 • Published • 23 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 57 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 83
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 3 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • Published -
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • Published
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 626 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper • 2402.18039 • Published • 11
-
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 43 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 54 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22
-
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 148 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
hustvl/Vim-tiny
Updated • 21
-
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Paper • 2401.10032 • Published • 13 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper • 2312.07537 • Published • 27 -
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Paper • 2312.05605 • Published • 3
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 148 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 3 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • Published -
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • Published
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 46 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 20 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 626 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper • 2402.18039 • Published • 11
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 148 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62 -
Vivim: a Video Vision Mamba for Medical Video Object Segmentation
Paper • 2401.14168 • Published • 2 -
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Paper • 2008.07669 • Published • 1
-
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 43 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 54 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22
-
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper • 2402.05140 • Published • 23 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 57 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 83
-
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 148 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
hustvl/Vim-tiny
Updated • 21
-
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Paper • 2401.10032 • Published • 13 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper • 2312.07537 • Published • 27 -
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Paper • 2312.05605 • Published • 3