code_trans_t5_base Open-source Code Abstract Model - Free Generation of Python Code Functional Abstracts

Code Trans T5 Base Source Code Summarization Python

Developed by SEBIS

A pre-trained model based on the T5 architecture, specifically designed for generating functional summaries of Python code

Text Generation #Python Code Summarization #T5 Architecture Optimization #Function-level Documentation Generation

Downloads 27

Release Time : 3/2/2022

Model Overview

This model is tailored for the Python programming language, capable of automatically generating functional descriptions for Python functions, supporting both raw and tokenized code processing

Model Features

Dedicated Python Code Processing

Optimized specifically for the Python programming language, performing best on tokenized Python code

Single-task Training

Focused exclusively on the source code summarization task, achieving optimal performance for this task

Independent Vocabulary Model

Uses an independent SentencePiece vocabulary model to enhance processing of special symbols in programming languages

Model Capabilities

Python code summarization generation

Source code functional description

Automatic code documentation generation

Use Cases

Code Documentation

Automatic Function Documentation Generation

Automatically generates functional description documentation for Python functions

Achieved a BLEU score of 13.37 on the test set (Base model)

Code Understanding Assistance

Code Review Assistance

Quickly generates functional summaries of code snippets to assist in code reviews

🚀 CodeTrans model for source code summarization in Python

A pre - trained model for summarizing Python source code using the T5 base model architecture.

This is a pre - trained model for the Python programming language, leveraging the t5 - base model architecture. It was first introduced in this repository. The model is trained on tokenized Python code functions and performs optimally with such input.

🚀 Quick Start

The CodeTrans model is designed to generate descriptions for Python functions and can be fine - tuned for other Python code - related tasks. It can handle unparsed and untokenized Python code, but tokenized code will yield better performance.

✨ Features

Based on the t5 - base model with its own SentencePiece vocabulary model.
Trained using single - task training on a Python source code summarization dataset.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to generate Python function documentation using Transformers SummarizationPipeline:

from transformers import AutoTokenizer, AutoModelWithLMHead, SummarizationPipeline

pipeline = SummarizationPipeline(
    model=AutoModelWithLMHead.from_pretrained("SEBIS/code_trans_t5_base_source_code_summarization_python"),
    tokenizer=AutoTokenizer.from_pretrained("SEBIS/code_trans_t5_base_source_code_summarization_python", skip_special_tokens=True),
    device=0
)

tokenized_code = '''with open ( CODE_STRING , CODE_STRING ) as in_file : buf = in_file . readlines ( )  with open ( CODE_STRING , CODE_STRING ) as out_file : for line in buf :          if line ==   " ; Include this text   " :              line = line +   " Include below  "          out_file . write ( line ) '''
pipeline([tokenized_code])

Run this example in colab notebook.

📚 Documentation

Model description

This CodeTrans model is based on the t5 - base model. It has its own SentencePiece vocabulary model and was trained using single - task training on a Python source code summarization dataset.

Intended uses & limitations

The model can generate descriptions for Python functions or be fine - tuned for other Python code tasks. It can process unparsed and untokenized Python code, but performance improves with tokenized code.

Training data

The supervised training tasks datasets can be downloaded on Link

Evaluation results

For the source code summarization tasks, different models achieve the following results on different programming languages (in BLEU score):

Property	Details
Model Type	CodeTrans models for source code summarization in Python
Training Data	Can be downloaded from Link

Test results:

Language / Model	Python	SQL	C#
CodeTrans - ST - Small	8.45	17.55	19.74
CodeTrans - ST - Base	9.12	15.00	18.65
CodeTrans - TF - Small	10.06	17.71	20.40
CodeTrans - TF - Base	10.94	17.66	21.12
CodeTrans - TF - Large	12.41	18.40	21.43
CodeTrans - MT - Small	13.11	19.15	22.39
CodeTrans - MT - Base	13.37	19.24	23.20
CodeTrans - MT - Large	13.24	19.40	23.57
CodeTrans - MT - TF - Small	12.10	18.25	22.03
CodeTrans - MT - TF - Base	10.64	16.91	21.40
CodeTrans - MT - TF - Large	12.14	19.98	21.10
CODE - NN	--	18.40	20.50

🔧 Technical Details

The model uses the t5 - base architecture and has its own SentencePiece vocabulary model. It was trained on a single - task for Python source code summarization.

📄 License

No license information is provided in the original document.

Created by Ahmed Elnaggar | [LinkedIn](https://www.linkedin.com/in/prof - ahmed - elnaggar/) and Wei Ding | [LinkedIn](https://www.linkedin.com/in/wei - ding - 92561270/)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご