Open-source Go language code documentation generation model code_trans_t5_small

Code Trans T5 Small Code Documentation Generation Go Multitask

Developed by SEBIS

Go language code documentation generation model based on T5-small architecture, supporting multi-task processing

Text Generation #Go function summarization #Multi-task pre-training #T5 architecture optimization

Downloads 17

Release Time : 3/2/2022

Model Overview

This model is specifically designed for generating function documentation for Go programming language. It is pre-trained on multiple tasks and can effectively process tokenized Go code functions.

Model Features

Multi-task training

The model was trained on 13 supervised tasks and 7 unsupervised datasets, demonstrating stronger generalization capabilities.

Tokenization optimization

Specifically optimized for tokenized Go code functions, delivering superior performance when processing such inputs.

TPU-efficient training

Utilized TPU Pod V3-8 for large-scale training, totaling 340,000 steps.

Model Capabilities

Go function documentation generation

Code understanding

Multi-task processing

Use Cases

Software development

Automatic Go function documentation generation

Automatically generates descriptive documentation for functions in Go codebases.

BLEU score 16.89 (Go language)

Code understanding assistance

Helps developers understand the functionality of complex Go functions.

🚀 CodeTrans model for code documentation generation go

A pre - trained model on the Go programming language using the T5 small model architecture, designed for generating code documentation.

🚀 Quick Start

This is a pre - trained model on the programming language Go, utilizing the T5 small model architecture. It was first released in this repository. This model is trained on tokenized Go code functions and performs best with such tokenized functions.

✨ Features

Based on the t5 - small model with its own SentencePiece vocabulary model.
Trained using multi - task training on 13 supervised tasks in the software development domain and 7 unsupervised datasets.
Can generate descriptions for Go functions or be fine - tuned for other Go code tasks.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to generate Go function documentation using Transformers SummarizationPipeline:

from transformers import AutoTokenizer, AutoModelWithLMHead, SummarizationPipeline

pipeline = SummarizationPipeline(
    model=AutoModelWithLMHead.from_pretrained("SEBIS/code_trans_t5_small_code_documentation_generation_go_multitask"),
    tokenizer=AutoTokenizer.from_pretrained("SEBIS/code_trans_t5_small_code_documentation_generation_go_multitask", skip_special_tokens=True),
    device=0
)

tokenized_code = "func ( pr * Progress ) needSnapshotAbort ( ) bool { return pr . State == ProgressStateSnapshot && pr . Match >= pr . PendingSnapshot   }"
pipeline([tokenized_code])

Run this example in colab notebook.

📚 Documentation

Model description

This CodeTrans model is based on the t5 - small model. It has its own SentencePiece vocabulary model. It used multi - task training on 13 supervised tasks in the software development domain and 7 unsupervised datasets.

Intended uses & limitations

The model could be used to generate the description for the Go function or be fine - tuned on other Go code tasks. It can be used on unparsed and untokenized Go code. However, if the Go code is tokenized, the performance should be better.

Training data

The supervised training tasks datasets can be downloaded on Link

Training procedure

Multi - task Pretraining

The model was trained on a single TPU Pod V3 - 8 for 340,000 steps in total, using sequence length 512 (batch size 4096). It has a total of approximately 220M parameters and was trained using the encoder - decoder architecture. The optimizer used is AdaFactor with inverse square root learning rate schedule for pre - training.

Evaluation results

For the code documentation tasks, different models achieve the following results on different programming languages (in BLEU score):

Test results :

Language / Model	Python	Java	Go	Php	Ruby	JavaScript
CodeTrans - ST - Small	17.31	16.65	16.89	23.05	9.19	13.7
CodeTrans - ST - Base	16.86	17.17	17.16	22.98	8.23	13.17
CodeTrans - TF - Small	19.93	19.48	18.88	25.35	13.15	17.23
CodeTrans - TF - Base	20.26	20.19	19.50	25.84	14.07	18.25
CodeTrans - TF - Large	20.35	20.06	19.54	26.18	14.94	18.98
CodeTrans - MT - Small	19.64	19.00	19.15	24.68	14.91	15.26
CodeTrans - MT - Base	20.39	21.22	19.43	26.23	15.26	16.11
CodeTrans - MT - Large	20.18	21.87	19.38	26.08	15.00	16.23
CodeTrans - MT - TF - Small	19.77	20.04	19.36	25.55	13.70	17.24
CodeTrans - MT - TF - Base	19.77	21.12	18.86	25.79	14.24	18.62
CodeTrans - MT - TF - Large	18.94	21.42	18.77	26.20	14.19	18.83
State of the art	19.06	17.65	18.07	25.16	12.16	14.90

📄 License

No license information is provided in the original document.

Created by Ahmed Elnaggar | LinkedIn and Wei Ding | LinkedIn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご