Collections
Discover the best community collections!
Collections including paper arxiv:2407.07612
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2
-
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2 -
Symbolic Learning Enables Self-Evolving Agents
Paper • 2406.18532 • Published • 12 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117 -
a-m-team/AM-Thinking-v1
Text Generation • 33B • Updated • 772 • • 199
-
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Paper • 2309.14327 • Published • 22 -
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 32 -
Memory^3: Language Modeling with Explicit Memory
Paper • 2407.01178 • Published • 4 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 20 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 50 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 23 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 5
-
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2 -
Symbolic Learning Enables Self-Evolving Agents
Paper • 2406.18532 • Published • 12 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117 -
a-m-team/AM-Thinking-v1
Text Generation • 33B • Updated • 772 • • 199
-
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Paper • 2309.14327 • Published • 22 -
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 32 -
Memory^3: Language Modeling with Explicit Memory
Paper • 2407.01178 • Published • 4 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 20 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 50 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 23 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 5