code_trans_t5_large Open-source Pretrained Model - Free Deployment for C# Code Summary Generation

Code Trans T5 Large Source Code Summarization Csharp Multitask

Developed by SEBIS

A pre-trained model based on the T5-large architecture, specializing in source code summarization tasks for the C# programming language

Large Language Model #C# Code Summarization #Multi-task Pretraining #Large Parameter Model

Downloads 23

Release Time : 3/2/2022

Model Overview

This model is used to generate descriptions for C# functions, supporting both unparsed and untokenized C# code, but performs better with tokenized code. It can be fine-tuned for other C# code-related tasks.

Model Features

Multi-task Training

Trained on 13 supervised tasks and 7 unsupervised datasets in the field of software development

Optimized Token Processing

Optimized for tokenized C# code functions, performing best on tokenized C# functions

High-Performance Summarization

Achieves a BLEU score of 23.57 on C# code summarization tasks, outperforming similar models

Model Capabilities

C# Source Code Summarization

Automatic Function Documentation Generation

Code Understanding Assistance

Use Cases

Software Development

Automatic Function Documentation

Automatically generates descriptive documentation for C# functions

Helps developers quickly understand code functionality

Code Understanding Assistance

Generates explanatory summaries for complex code snippets

Improves code review and maintenance efficiency

🚀 CodeTrans Model for Source Code Summarization in C#

A pre - trained model for source code summarization in C# using the T5 large model architecture.

This is a pre - trained model on the programming language C# that utilizes the t5-large model architecture. It was first released in this repository. This model is trained on tokenized C# code functions, and it performs best when used with tokenized C# functions.

✨ Features

Model description

This CodeTrans model is based on the t5-large model. It has its own SentencePiece vocabulary model. It employed multi - task training on 13 supervised tasks in the software development domain and 7 unsupervised datasets.

Intended uses & limitations

The model can be used to generate descriptions for C# functions or be fine - tuned for other C# code tasks. It can handle unparsed and untokenized C# code, but its performance is better when the C# code is tokenized.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Here is how to use this model to generate C# function documentation using Transformers SummarizationPipeline:

from transformers import AutoTokenizer, AutoModelWithLMHead, SummarizationPipeline

pipeline = SummarizationPipeline(
    model=AutoModelWithLMHead.from_pretrained("SEBIS/code_trans_t5_large_source_code_summarization_csharp_multitask"),
    tokenizer=AutoTokenizer.from_pretrained("SEBIS/code_trans_t5_large_source_code_summarization_csharp_multitask", skip_special_tokens=True),
    device=0
)

tokenized_code = "public static DateTime ParseUnixDateTime ( double unixTime ) { var dt = new DateTime ( CODE_INTEGER , CODE_INTEGER , CODE_INTEGER , CODE_INTEGER , CODE_INTEGER , CODE_INTEGER , CODE_INTEGER , System . DateTimeKind . Utc ) ; dt = dt . AddSeconds ( unixTimeStamp ) . ToLocalTime ( ) ; return dt ; }"
pipeline([tokenized_code])

You can run this example in colab notebook.

📚 Documentation

Training data

The supervised training tasks datasets can be downloaded from Link

Training procedure

Multi - task Pretraining

The model was trained on a single TPU Pod V3 - 8 for a total of 120,000 steps, using a sequence length of 512 (batch size 4096). It has approximately 220M parameters in total and was trained using the encoder - decoder architecture. The optimizer used is AdaFactor with an inverse square root learning rate schedule for pre - training.

Evaluation results

For the source code summarization tasks, different models achieve the following results on different programming languages (in BLEU score):

Test results:

Language / Model	Python	SQL	C#
CodeTrans - ST - Small	8.45	17.55	19.74
CodeTrans - ST - Base	9.12	15.00	18.65
CodeTrans - TF - Small	10.06	17.71	20.40
CodeTrans - TF - Base	10.94	17.66	21.12
CodeTrans - TF - Large	12.41	18.40	21.43
CodeTrans - MT - Small	13.11	19.15	22.39
CodeTrans - MT - Base	13.37	19.24	23.20
CodeTrans - MT - Large	13.24	19.40	23.57
CodeTrans - MT - TF - Small	12.10	18.25	22.03
CodeTrans - MT - TF - Base	10.64	16.91	21.40
CodeTrans - MT - TF - Large	12.14	19.98	21.10
CODE - NN	--	18.40	20.50

📄 License

No license information is provided in the original document, so this section is skipped.

Created by Ahmed Elnaggar | LinkedIn and Wei Ding | LinkedIn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご