Open-source code_trans_t5_base model - Generate descriptive code documentation for Python functions for free

Code Trans T5 Base Code Documentation Generation Python

Developed by SEBIS

A T5-based model specialized in generating descriptive documentation for Python functions

Text Generation #Python Function Summarization #Code Documentation Automation #T5 Architecture Optimization

Downloads 144

Release Time : 3/2/2022

Model Overview

This model is based on the T5 base architecture, pretrained specifically for the Python programming language, primarily used for generating documentation descriptions for Python functions. It performs best on tokenized Python code.

Model Features

Optimized for Python Code

Specially pretrained and optimized for the Python programming language

Supports Raw Code Input

Can directly process unparsed and untokenized raw Python code

Single-Task Training

Adopts single-task training approach, focusing solely on code documentation generation

Model Capabilities

Python Function Documentation Generation

Code Summarization

Use Cases

Code Documentation Automation

Function Documentation Generation

Automatically generates descriptive documentation for Python functions

Achieves 17.31 BLEU score on Python code

Development Tool Integration

IDE Plugin

Integrated into development environments for automatic code documentation generation

🚀 CodeTrans model for code documentation generation in Python

This is a pre - trained model for the Python programming language, leveraging the T5 base model architecture. It was initially released in this repository. This model is trained on tokenized Python code functions and performs optimally with such tokenized Python functions.

🚀 Quick Start

✨ Features

Based on the t5 - base model with its own SentencePiece vocabulary model.
Trained using single - task training on the CodeSearchNet Corpus Python dataset.
Can generate descriptions for Python functions or be fine - tuned for other Python code tasks.
Works on both unparsed/untokenized and tokenized Python code, with better performance on tokenized code.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to generate Python function documentation using Transformers SummarizationPipeline:

from transformers import AutoTokenizer, AutoModelWithLMHead, SummarizationPipeline

pipeline = SummarizationPipeline(
    model=AutoModelWithLMHead.from_pretrained("SEBIS/code_trans_t5_base_code_documentation_generation_python"),
    tokenizer=AutoTokenizer.from_pretrained("SEBIS/code_trans_t5_base_code_documentation_generation_python", skip_special_tokens=True),
    device=0
)

tokenized_code = "def e ( message , exit_code = None ) : print_log ( message , YELLOW , BOLD ) if exit_code is not None : sys . exit ( exit_code )"
pipeline([tokenized_code])

Run this example in colab notebook.

📚 Documentation

Model description

This CodeTrans model is based on the t5 - base model. It has its own SentencePiece vocabulary model. It used single - task training on CodeSearchNet Corpus Python dataset.

Intended uses & limitations

The model could be used to generate the description for the Python function or be fine - tuned on other Python code tasks. It can be used on unparsed and untokenized Python code. However, if the Python code is tokenized, the performance should be better.

🔧 Technical Details

The supervised training tasks datasets can be downloaded on Link

📄 License

No license information is provided in the original document.

Evaluation results

For the code documentation tasks, different models achieve the following results on different programming languages (in BLEU score):

Property	Details
Model Type	CodeTrans models (CodeTrans - ST - Small, CodeTrans - ST - Base, etc.)
Training Data	CodeSearchNet Corpus Python dataset, supervised training tasks datasets available at Link

Test results:

Language / Model	Python	Java	Go	Php	Ruby	JavaScript
CodeTrans - ST - Small	17.31	16.65	16.89	23.05	9.19	13.7
CodeTrans - ST - Base	16.86	17.17	17.16	22.98	8.23	13.17
CodeTrans - TF - Small	19.93	19.48	18.88	25.35	13.15	17.23
CodeTrans - TF - Base	20.26	20.19	19.50	25.84	14.07	18.25
CodeTrans - TF - Large	20.35	20.06	19.54	26.18	14.94	18.98
CodeTrans - MT - Small	19.64	19.00	19.15	24.68	14.91	15.26
CodeTrans - MT - Base	20.39	21.22	19.43	26.23	15.26	16.11
CodeTrans - MT - Large	20.18	21.87	19.38	26.08	15.00	16.23
CodeTrans - MT - TF - Small	19.77	20.04	19.36	25.55	13.70	17.24
CodeTrans - MT - TF - Base	19.77	21.12	18.86	25.79	14.24	18.62
CodeTrans - MT - TF - Large	18.94	21.42	18.77	26.20	14.19	18.83
State of the art	19.06	17.65	18.07	25.16	12.16	14.90

Created by Ahmed Elnaggar | LinkedIn and Wei Ding | LinkedIn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご