code_trans_t5_small Source Code Summarization SQL Open-source Model - Free Generation of Functional Descriptions for SQL Functions

Code Trans T5 Small Source Code Summarization Sql

Developed by SEBIS

A SQL code summarization model based on T5-small architecture, specifically designed for generating functional descriptions of SQL functions

Text Generation #SQL code summarization #T5-small architecture #Single-task training

Downloads 15

Release Time : 3/2/2022

Model Overview

This model is a Transformer pre-trained on SQL programming language, excelling in processing tokenized SQL code, primarily used for automatic SQL source code summarization tasks.

Model Features

SQL-specific pre-training

Pre-trained specifically for SQL programming language, with enhanced understanding of SQL code

Tokenization optimization

Performs best with tokenized SQL code, achieving superior performance

Single-task training

Focused exclusively on SQL source code summarization, delivering strong performance on this specific task

Model Capabilities

SQL code summarization

Automatic SQL function documentation generation

SQL code comprehension

Use Cases

Code documentation

SQL function documentation generation

Automatically generates functional description documents for SQL functions

BLEU score 17.55 (on SQL summarization task)

Code comprehension assistance

SQL code explanation

Helps developers understand the functionality of complex SQL code

🚀 CodeTrans Model for Source Code Summarization in SQL

This is a pre - trained model for the SQL programming language, leveraging the T5 small model architecture. It was initially released in this repository. This model is trained on tokenized SQL code functions, and it performs optimally with tokenized SQL functions.

✨ Features

Based on the t5 - small model with its own SentencePiece vocabulary model.
Trained using single - task training on a source code summarization SQL dataset.
Can generate descriptions for SQL functions or be fine - tuned for other SQL code tasks.
Works on unparsed and untokenized SQL code, but performs better with tokenized code.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to generate SQL function documentation using Transformers SummarizationPipeline:

from transformers import AutoTokenizer, AutoModelWithLMHead, SummarizationPipeline

pipeline = SummarizationPipeline(
    model=AutoModelWithLMHead.from_pretrained("SEBIS/code_trans_t5_small_source_code_summarization_sql"),
    tokenizer=AutoTokenizer.from_pretrained("SEBIS/code_trans_t5_small_source_code_summarization_sql", skip_special_tokens=True),
    device=0
)

tokenized_code = "select time ( col0 ) from tab0"
pipeline([tokenized_code])

Run this example in colab notebook.

📚 Documentation

Model description

This CodeTrans model is based on the t5 - small model. It has its own SentencePiece vocabulary model. It used single - task training on source code summarization sql dataset.

Intended uses & limitations

The model could be used to generate the description for the sql function or be fine - tuned on other sql code tasks. It can be used on unparsed and untokenized sql code. However, if the sql code is tokenized, the performance should be better.

🔧 Technical Details

Training data

The supervised training tasks datasets can be downloaded on Link

Evaluation results

For the source code summarization tasks, different models achieve the following results on different programming languages (in BLEU score):

Test results :

Language / Model	Python	SQL	C#
CodeTrans - ST - Small	8.45	17.55	19.74
CodeTrans - ST - Base	9.12	15.00	18.65
CodeTrans - TF - Small	10.06	17.71	20.40
CodeTrans - TF - Base	10.94	17.66	21.12
CodeTrans - TF - Large	12.41	18.40	21.43
CodeTrans - MT - Small	13.11	19.15	22.39
CodeTrans - MT - Base	13.37	19.24	23.20
CodeTrans - MT - Large	13.24	19.40	23.57
CodeTrans - MT - TF - Small	12.10	18.25	22.03
CodeTrans - MT - TF - Base	10.64	16.91	21.40
CodeTrans - MT - TF - Large	12.14	19.98	21.10
CODE - NN	--	18.40	20.50

Created by Ahmed Elnaggar | LinkedIn and Wei Ding | LinkedIn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご