-
Qwen/Qwen3-235B-A22B-Thinking-2507-FP8
Text Generation • 235B • Updated • 89.5k • 83 -
Qwen/Qwen3-235B-A22B-Thinking-2507
Text Generation • Updated • 59.8k • • 398 -
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
Text Generation • 235B • Updated • 577k • 146 -
Qwen/Qwen3-235B-A22B-Instruct-2507
Text Generation • Updated • 180k • • 762
Collections
Discover the best community collections!
Collections including paper arxiv:2505.09388
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Reinforced Attention Learning
Paper • 2602.04884 • Published • 28 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3 -
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations
Paper • 2601.05732 • Published • 1
-
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 65 -
Recursive Language Models
Paper • 2512.24601 • Published • 91 -
Nested Learning: The Illusion of Deep Learning Architectures
Paper • 2512.24695 • Published • 44 -
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 260
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 115 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 48
-
Qwen/Qwen3-235B-A22B-Thinking-2507-FP8
Text Generation • 235B • Updated • 89.5k • 83 -
Qwen/Qwen3-235B-A22B-Thinking-2507
Text Generation • Updated • 59.8k • • 398 -
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
Text Generation • 235B • Updated • 577k • 146 -
Qwen/Qwen3-235B-A22B-Instruct-2507
Text Generation • Updated • 180k • • 762
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Reinforced Attention Learning
Paper • 2602.04884 • Published • 28 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3 -
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations
Paper • 2601.05732 • Published • 1
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 65 -
Recursive Language Models
Paper • 2512.24601 • Published • 91 -
Nested Learning: The Illusion of Deep Learning Architectures
Paper • 2512.24695 • Published • 44 -
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 260
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 115 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 48