Open-source replit-code-v1-3b code generation model supporting 20 programming languages. Free to use!

Home

Replit Code V1 3b

Developed by replit

A 2.7 billion parameter code generation model developed by Replit, supporting 20 programming languages

Large Language Model

Transformers

Other#Code Completion #Multilingual Programming #Low-Resource Inference

Downloads 605

Release Time : 4/28/2023

Model Overview

A causal language model focused on code completion, trained on the Stack Dedup dataset, supporting code generation and completion for multiple programming languages

Model Features

Multilingual Support

Supports code generation and completion for 20 programming languages

Efficient Training Techniques

Uses Flash Attention to accelerate training and inference, with AliBi positional encoding to support variable context lengths

Optimizer Innovation

Uses LionW optimizer to improve training efficiency

Large-Scale Training

Trained for 3 epochs on a 175 billion token dataset, totaling 525 billion tokens

Model Capabilities

Code Completion

Code Generation

Multilingual Support

Context-Aware Programming

Use Cases

Development Assistance

Code Auto-Completion

Provides intelligent code completion suggestions in IDEs

Improves development efficiency

Function Generation

Generates complete function implementations based on function signatures or comments

pass@1 score of 0.219 (HumanEval)

Education

Programming Learning Assistance

Provides code examples and explanations for learners

🚀 replit-code-v1-3b

replit-code-v1-3b is a 2.7B Causal Language Model designed for Code Completion. It offers users a reliable solution for generating code snippets, leveraging advanced techniques and a diverse training dataset.

🧑‍💻 Test it on our Demo Space! 🧑‍💻

⚙️ Fine-tuning and Instruct-tuning guides ⚙️

✨ Features

Multilingual Support: Trained on 20 different programming languages, including Markdown, Java, JavaScript, Python, etc.
Large Training Dataset: Trained on 525B tokens, providing rich knowledge for code generation.
Advanced Techniques: Utilizes state-of-the-art LLM techniques such as Flash Attention, AliBi positional embeddings, and LionW optimizer.

📦 Installation

First of all, you need to install the latest versions of the following dependencies:

einops
sentencepiece
torch
transformers

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM

# load model
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)

Advanced Usage

To use the optimized Triton implementation of FlashAttention on GPUs with BF16 precision:

# Install dependencies
# ```
# flash-attn==0.2.8
# triton==2.0.0.dev20221202
# ```

from transformers import AutoModelForCausalLM, AutoConfig

config = AutoConfig.from_pretrained(
    "replit/replit-code-v1-3b",
    trust_remote_code=True
)
config.attn_config['attn_impl'] = 'triton'

# load model
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', config=config, trust_remote_code=True)
model.to(device='cuda:0', dtype=torch.bfloat16)

# forward pass
x = torch.tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
x = x.to(device='cuda:0')
y = model(x)

Tokenizer

from transformers import AutoTokenizer

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)

# single input encoding + generation
x = tokenizer.encode('def hello():\n  print("hello world")\n', return_tensors='pt')
y = model.generate(x)

# decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
generated_code = tokenizer.decode(y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(generated_code)

Generation

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)

x = tokenizer.encode('def fibonacci(n): ', return_tensors='pt')
y = model.generate(x, max_length=100, do_sample=True, top_p=0.95, top_k=4, temperature=0.2, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)

# decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
generated_code = tokenizer.decode(y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(generated_code)

Loading with 8-bit and 4-bit quantization

Loading in 8-bit

# Install additional dependencies
# ```
# accelerate
# bitsandbytes
# ```

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("replit/replit-code-v1-3b", 
                                             trust_remote_code=True, 
                                             device_map="auto",
                                             load_in_8bit=True)

Loading in 4-bit

pip install git+https://github.com/huggingface/accelerate.git
pip install git+https://github.com/huggingface/transformers.git

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("replit/replit-code-v1-3b", 
                                             trust_remote_code=True, 
                                             device_map="auto",
                                             load_in_4bit=True)

📚 Documentation

Model Description

replit-code-v1-3b is a 2.7B Causal Language Model focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset.

The training mixture includes 20 different languages, listed here in descending order of number of tokens: Markdown, Java, JavaScript, Python, TypeScript, PHP, SQL, JSX, reStructuredText, Rust, C, CSS, Go, C++, HTML, Vue, Ruby, Jupyter Notebook, R, Shell

In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, replit-code-v1-3b has been trained on 525B tokens (~195 tokens per parameter).

The model has been trained on the MosaicML platform with 256 x A100-40GB GPUs, leveraging their latest LLM examples repo.

replit-code-v1-3b is powered by state-of-the-art LLM techniques, such as: Flash Attention for fast training and inference, AliBi positional embeddings to support variable context length at inference time, LionW optimizer, etc.

Intended Use

Replit intends this model be used by anyone as a foundational model for application-specific fine-tuning without strict limitations on commercial use.

Limitations

The pre-training dataset may have contained offensive or inappropriate content even after applying data cleansing filters, and such content may be reflected in model generated text. We recommend that users exercise reasonable caution when using in production systems. Do not use for any applications that may cause harm or distress to individuals or groups.

Post Processing

Note that as with all code generation models, post-processing of the generated code is important. In particular, the following post-processing steps are recommended:

stop generation when the EOS token is encountered
remove trailing whitespaces
set max_tokens to a reasonable value based on your completion use case
truncate generation to stop words such as return, def, "```", "\n\n\n" to avoid generating incomplete code when max_tokens is larger than the length of the expected generated code.

🔧 Technical Details

The model is trained on the MosaicML platform with 256 x A100-40GB GPUs. It uses advanced techniques like Flash Attention for fast training and inference, AliBi positional embeddings to support variable context length, and the LionW optimizer.

📄 License

The model checkpoint and vocabulary file are licensed under the Creative Commons license (CC BY-SA-4.0). Under the license, you must give credit to Replit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests that Replit endorses you or your use.

The source code files (*.py) are licensed under the Apache 2.0 license.

Contact

For questions and comments about the model, please post in the community section.

📊 Model Information

Property	Details
Model Type	Causal Language Model
Training Data	A subset of the Stack Dedup v1.2 dataset, containing 525B tokens in total
Programming Languages	`Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
Results (HumanEval - pass@1)	0.219

⚠️ Important Note

The pre-training dataset may have contained offensive or inappropriate content even after applying data cleansing filters, and such content may be reflected in model generated text. Exercise reasonable caution when using in production systems.

💡 Usage Tip

Experiment with different decoding methods and parameters to get the best results for your use case. Also, perform post-processing on the generated code to ensure syntactical correctness.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご