🚀 CodeTrans Model for Go Code Documentation Generation
This is a pre - trained model for the Go programming language, leveraging the T5 small model architecture. It was first introduced in this repository. The model is trained on tokenized Go code functions and performs optimally with such inputs.
🚀 Quick Start
This CodeTrans model can be used to generate descriptions for Go functions or be fine - tuned for other Go code - related tasks. It can handle unparsed and untokenized Go code, but tokenized code will yield better performance.
✨ Features
- Based on the
t5 - small
model with its own SentencePiece vocabulary model.
- Utilizes multi - task training on 13 supervised tasks in software development and 7 unsupervised datasets.
- Fine - tuned for the code documentation generation task of Go functions/methods.
📦 Installation
No specific installation steps are provided in the original README. However, to use the model, you need to install the transformers
library. You can install it using the following command:
pip install transformers
💻 Usage Examples
Basic Usage
Here's how to use this model to generate Go function documentation using the Transformers SummarizationPipeline:
from transformers import AutoTokenizer, AutoModelWithLMHead, SummarizationPipeline
pipeline = SummarizationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/code_trans_t5_small_code_documentation_generation_go_multitask_finetune"),
tokenizer=AutoTokenizer.from_pretrained("SEBIS/code_trans_t5_small_code_documentation_generation_go_multitask_finetune", skip_special_tokens=True),
device=0
)
tokenized_code = "func ( pr * Progress ) needSnapshotAbort ( ) bool { return pr . State == ProgressStateSnapshot && pr . Match >= pr . PendingSnapshot }"
pipeline([tokenized_code])
You can run this example in [colab notebook](https://github.com/agemagician/CodeTrans/blob/main/prediction/multitask/fine - tuning/function%20documentation%20generation/go/small_model.ipynb).
📚 Documentation
Model Description
The CodeTrans model is based on the t5 - small
model. It has its own SentencePiece vocabulary model. It was trained using multi - task training on 13 supervised tasks in the software development domain and 7 unsupervised datasets. Then it was fine - tuned for the code documentation generation task of Go functions/methods.
Intended Uses & Limitations
The model can generate descriptions for Go functions or be fine - tuned for other Go code tasks. It can work on unparsed and untokenized Go code, but tokenized code will result in better performance.
Training Data
The supervised training tasks datasets can be downloaded from Link
Training Procedure
Multi - task Pretraining
The model was trained on a single TPU Pod V3 - 8 for a total of half a million steps, using a sequence length of 512 (batch size 4096). It has approximately 220M parameters in total and was trained using the encoder - decoder architecture. The optimizer used is AdaFactor with an inverse square root learning rate schedule for pre - training.
Fine - tuning
This model was then fine - tuned on a single TPU Pod V2 - 8 for a total of 2000 steps, using a sequence length of 512 (batch size 256), using only the dataset containing Go code.
Evaluation Results
For the code documentation tasks, different models achieve the following results on different programming languages (in BLEU score):
Property |
Details |
Model Type |
CodeTrans model for Go code documentation generation, based on t5 - small |
Training Data |
Supervised training tasks datasets can be downloaded from Link |
Language / Model |
Python |
Java |
Go |
Php |
Ruby |
JavaScript |
CodeTrans - ST - Small |
17.31 |
16.65 |
16.89 |
23.05 |
9.19 |
13.7 |
CodeTrans - ST - Base |
16.86 |
17.17 |
17.16 |
22.98 |
8.23 |
13.17 |
CodeTrans - TF - Small |
19.93 |
19.48 |
18.88 |
25.35 |
13.15 |
17.23 |
CodeTrans - TF - Base |
20.26 |
20.19 |
19.50 |
25.84 |
14.07 |
18.25 |
CodeTrans - TF - Large |
20.35 |
20.06 |
19.54 |
26.18 |
14.94 |
18.98 |
CodeTrans - MT - Small |
19.64 |
19.00 |
19.15 |
24.68 |
14.91 |
15.26 |
CodeTrans - MT - Base |
20.39 |
21.22 |
19.43 |
26.23 |
15.26 |
16.11 |
CodeTrans - MT - Large |
20.18 |
21.87 |
19.38 |
26.08 |
15.00 |
16.23 |
CodeTrans - MT - TF - Small |
19.77 |
20.04 |
19.36 |
25.55 |
13.70 |
17.24 |
CodeTrans - MT - TF - Base |
19.77 |
21.12 |
18.86 |
25.79 |
14.24 |
18.62 |
CodeTrans - MT - TF - Large |
18.94 |
21.42 |
18.77 |
26.20 |
14.19 |
18.83 |
State of the art |
19.06 |
17.65 |
18.07 |
25.16 |
12.16 |
14.90 |
Created by Ahmed Elnaggar | [LinkedIn](https://www.linkedin.com/in/prof - ahmed - elnaggar/) and Wei Ding | [LinkedIn](https://www.linkedin.com/in/wei - ding - 92561270/)