Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2312.02120

Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Paper • 2505.19443 • Published May 26 • 15
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Paper • 2506.19290 • Published Jun 24 • 52
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

Paper • 2105.12655 • Published May 25, 2021
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 151

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

Paper • 2004.09602 • Published Apr 20, 2020
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82

Synthetic Data Generation

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 149
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 88
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 36
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 104

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 42
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 22
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15, 2024 • 38

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 31
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs

Paper • 2504.04030 • Published Apr 5 • 1
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

Paper • 2503.02951 • Published Mar 4 • 33
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22, 2024 • 48
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82

Synthetic Data papers

Papers and important approraches for generation of synthetic data

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3, 2024 • 51
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 71
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31

Code LMs Evaluation

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 9
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22, 2024 • 11

Symbolic LLM Reasoning

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 68
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

Paper • 2312.14187 • Published Dec 20, 2023 • 50
On the Effectiveness of Large Language Models in Domain-Specific Code Generation

Paper • 2312.01639 • Published Dec 4, 2023 • 2

llm-paper-reading

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 260
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109

Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Paper • 2505.19443 • Published May 26 • 15
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Paper • 2506.19290 • Published Jun 24 • 52
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

Paper • 2105.12655 • Published May 25, 2021
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 151

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs

Paper • 2504.04030 • Published Apr 5 • 1
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

Paper • 2503.02951 • Published Mar 4 • 33
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22, 2024 • 48
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

Paper • 2004.09602 • Published Apr 20, 2020
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82

Synthetic Data papers

Papers and important approraches for generation of synthetic data

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3, 2024 • 51
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 71
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31

Synthetic Data Generation

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 149
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 88
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 36
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 104

Code LMs Evaluation

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 9
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22, 2024 • 11

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 42
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 22
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15, 2024 • 38

Symbolic LLM Reasoning

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 68
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

Paper • 2312.14187 • Published Dec 20, 2023 • 50
On the Effectiveness of Large Language Models in Domain-Specific Code Generation

Paper • 2312.01639 • Published Dec 4, 2023 • 2

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 31
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

llm-paper-reading

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 260
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109

Previous
1
2
3
...
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs