Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- JoaoJunior/python_java_dataset_APR
|
| 4 |
+
tags:
|
| 5 |
+
- APR
|
| 6 |
+
- AI
|
| 7 |
+
---
|
| 8 |
+
# Introduction
|
| 9 |
+
This model, JoaoJunior/T5_APR_java_python_v4, is a fine-tuned version of the pre-trained CodeT5 model from Salesforce. The model is designed to understand and generate code, with a specific focus on bug fixing tasks in Python and Java languages.
|
| 10 |
+
|
| 11 |
+
# Description
|
| 12 |
+
The CodeT5 model was introduced in the paper "CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation". This model leverages the semantics conveyed from the developer-assigned identifiers in the code, allowing for effective code understanding and generation tasks.
|
| 13 |
+
|
| 14 |
+
JoaoJunior/T5_APR_java_python_v4 was trained on the python_java_dataset_APR dataset, which contains pairs of bugged and fixed code in Python and Java. This dataset was created using the coconut_java2006 and coconut_python2010 datasets from the CoCoNuT project as its base.
|
| 15 |
+
|
| 16 |
+
# Objective
|
| 17 |
+
The primary objective of this model is to identify and fix bugs in Python and Java code. By fine-tuning the CodeT5 model on the python_java_dataset_APR dataset, this model aims to effectively learn the patterns and structures of these languages, enabling it to accurately detect and correct errors.
|
| 18 |
+
|
| 19 |
+
# References
|
| 20 |
+
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation by Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi
|
| 21 |
+
- python_java_dataset_APR: A dataset containing pairs of bugged and fixed code in Python and Java, created using the CoCoNuT project's coconut_java2006 and coconut_python2010 datasets
|
| 22 |
+
- CoCoNuT: Combining Context-Aware Neural Translation Models using Ensemble for Program Repair
|