-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper ⢠2401.02038 ⢠Published ⢠65 -
Learning To Teach Large Language Models Logical Reasoning
Paper ⢠2310.09158 ⢠Published ⢠1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper ⢠2311.00176 ⢠Published ⢠9 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper ⢠2308.09583 ⢠Published ⢠7
Collections
Discover the best community collections!
Collections including paper arxiv:2410.02724
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper ⢠2309.14509 ⢠Published ⢠19 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper ⢠2401.02412 ⢠Published ⢠38 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper ⢠2401.06066 ⢠Published ⢠58 -
Tuning Language Models by Proxy
Paper ⢠2401.08565 ⢠Published ⢠22
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper ⢠2401.02038 ⢠Published ⢠65 -
Learning To Teach Large Language Models Logical Reasoning
Paper ⢠2310.09158 ⢠Published ⢠1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper ⢠2311.00176 ⢠Published ⢠9 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper ⢠2308.09583 ⢠Published ⢠7
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper ⢠2309.14509 ⢠Published ⢠19 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper ⢠2401.02412 ⢠Published ⢠38 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper ⢠2401.06066 ⢠Published ⢠58 -
Tuning Language Models by Proxy
Paper ⢠2401.08565 ⢠Published ⢠22