Code Autocomplete with DistilGPT2 for Python: An Open-Source Model for Free and Smart Python Code Completion

Code Autocomplete Distilgpt2 Python

Developed by shibing624

A Python code auto-completion model based on GPT2, specifically designed for intelligent Python code completion

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Python code completion #GPT2 lightweight version #IDE plugin integration

Downloads 295

Release Time : 3/2/2022

Model Overview

This model is based on the GPT2 architecture and is specifically designed for intelligent Python code completion, capable of predicting and generating code lines or blocks based on context.

Model Features

Python code specialization

An auto-completion model specifically optimized for Python code

Lightweight model

A lightweight implementation based on DistilGPT2, reducing resource consumption while maintaining performance

Context-aware

Capable of understanding code context to provide more accurate completion suggestions

Model Capabilities

Code auto-completion

Code line generation

Code block generation

Use Cases

Development tools

IDE plugin

Integrated into development environments to provide code completion functionality

Improves development efficiency and reduces coding errors

Code assistance tool

Helps developers quickly generate common code snippets

Accelerates the development process

🚀 GPT2 for Code AutoComplete Model

A code completion plugin for Python that leverages GPT2 to automatically complete lines and blocks of code.

🚀 Quick Start

The open - source repository code - autocomplete supports the GPT2 model. Here's how to use it:

from autocomplete.gpt2_coder import GPT2Coder

m = GPT2Coder("shibing624/code-autocomplete-distilgpt2-python")
print(m.generate('import torch.nn as')[0])

You can also use huggingface/transformers.

⚠️ Important Note

Please use 'GPT2' related functions to load this model!

import os
from transformers import GPT2Tokenizer, GPT2LMHeadModel

os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

tokenizer = GPT2Tokenizer.from_pretrained("shibing624/code-autocomplete-distilgpt2-python")
model = GPT2LMHeadModel.from_pretrained("shibing624/code-autocomplete-distilgpt2-python")

prompts = [
    """from torch import nn
    class LSTM(Module):
        def __init__(self, *,
                     n_tokens: int,
                     embedding_size: int,
                     hidden_size: int,
                     n_layers: int):""",
    """import numpy as np
    import torch
    import torch.nn as""",
    "import java.util.ArrayList",
    "def factorial(n):",
]
for prompt in prompts:
    input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt')
    outputs = model.generate(input_ids=input_ids,
                             max_length=64 + len(prompt),
                             temperature=1.0,
                             top_k=50,
                             top_p=0.95,
                             repetition_penalty=1.0,
                             do_sample=True,
                             num_return_sequences=1,
                             length_penalty=2.0,
                             early_stopping=True)
    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(decoded)
    print("=" * 20)

Output Example

from torch import nn
    class LSTM(Module):
        def __init__(self, *,
                     n_tokens: int,
                     embedding_size: int,
                     hidden_size: int,
                     n_layers: int):
            self.embedding_size = embedding_size
====================
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

Model Files

code-autocomplete-distilgpt2-python
├── config.json
├── merges.txt
├── pytorch_model.bin
├── special_tokens_map.json
├── tokenizer_config.json
└── vocab.json

📚 Documentation

Training Data

Pytorch_awesome projects source code

You can download code - autocomplete and run the following commands:

cd autocomplete
python create_dataset.py

If you want to train the code - autocomplete GPT2 model, refer to https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2_coder.py

About GPT2

You can test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large

The GPT2 model is a pretrained model on the English language using a causal language modeling (CLM) objective. It was introduced in this paper and first released at this page.

Disclaimer: The team releasing GPT - 2 also wrote a model card for their model. Content from this model card has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.

📄 License

This project is licensed under the Apache - 2.0 license.

📖 Citation

@misc{code-autocomplete,
  author = {Xu Ming},
  title = {code-autocomplete: Code AutoComplete with GPT model},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/shibing624/code-autocomplete},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご