Model Overview
Model Features
Model Capabilities
Use Cases
đ replit-code-v1-3b
replit-code-v1-3b
is a 2.7B Causal Language Model designed for Code Completion. It's trained on a subset of the Stack Dedup v1.2 dataset, offering high - quality code generation capabilities.
đ§âđģ Test it on our Demo Space! đ§âđģ
đ Quick Start
First, install the latest versions of the following dependencies:
einops
sentencepiece
torch
transformers
Then, you can load the model:
from transformers import AutoModelForCausalLM
# load model
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
⨠Features
- Multilingual Support: Trained on 20 different languages, including
Markdown
,Java
,JavaScript
, etc. - Large - Scale Training: Trained on 525B tokens, with 175B tokens repeated over 3 epochs.
- Advanced Techniques: Utilizes state - of - the - art techniques like Flash Attention, AliBi positional embeddings, and LionW optimizer.
đĻ Installation
Install Basic Dependencies
einops
sentencepiece
torch
transformers
Install Dependencies for Optimized Triton Implementation
flash-attn==0.2.8
triton==2.0.0.dev20221202
đģ Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM
# load model
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
# forward pass
x = torch.tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
y = model(x)
Advanced Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
x = tokenizer.encode('def fibonacci(n): ', return_tensors='pt')
y = model.generate(x, max_length=100, do_sample=True, top_p=0.95, top_k=4, temperature=0.2, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
# decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
generated_code = tokenizer.decode(y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(generated_code)
đ Documentation
Model Description
replit-code-v1-3b
is a 2.7B Causal Language Model focused on Code Completion. It has been trained on a subset of the Stack Dedup v1.2 dataset.
The training mixture includes 20 different languages, listed in descending order of number of tokens:
Markdown
, Java
, JavaScript
, Python
, TypeScript
, PHP
, SQL
, JSX
, reStructuredText
, Rust
, C
, CSS
, Go
, C++
, HTML
, Vue
, Ruby
, Jupyter Notebook
, R
, Shell
In total, the training dataset contains 175B tokens, repeated over 3 epochs. So, replit-code-v1-3b
has been trained on 525B tokens (~195 tokens per parameter).
The model was trained on the MosaicML platform with 256 x A100 - 40GB GPUs, using their latest LLM examples repo.
Intended Use
Replit intends this model to be used as a foundational model for application - specific fine - tuning, with no strict limitations on commercial use.
Limitations
The pre - training dataset may contain offensive or inappropriate content even after data cleansing. Such content may appear in the model's generated text. Users should exercise caution when using it in production systems and avoid using it for applications that may cause harm.
Tokenizer
We trained a custom SentencePiece Unigram tokenizer with a 32768 - token vocabulary optimized for code. Using this requires the sentencepiece
library.
from transformers import AutoTokenizer
# load tokenizer
tokenizer = AutoTokenizer.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
# single input encoding + generation
x = tokenizer.encode('def hello():\n print("hello world")\n', return_tensors='pt')
y = model.generate(x)
# decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
generated_code = tokenizer.decode(y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(generated_code)
Generation
You can generate code using the transformers
library. Experiment with different decoding methods and parameters for the best results.
Post Processing
Post - processing of the generated code is crucial. Recommended steps include:
- Stop generation when the EOS token is encountered.
- Remove trailing whitespaces.
- Set
max_tokens
based on your use case. - Truncate generation to stop words to avoid incomplete code.
đ§ Technical Details
The model has been trained on the MosaicML platform with 256 x A100 - 40GB GPUs. It leverages techniques like Flash Attention for fast training and inference, AliBi positional embeddings to support variable context length at inference time, and the LionW optimizer.
đ License
The model checkpoint and vocabulary file are licensed under the Creative Commons license (CC BY - SA - 4.0). Under the license, you must credit Replit, provide a link to the license, and indicate if changes were made.
đ Model Information
Property | Details |
---|---|
Model Name | replit - code - v1 - 3b |
Model Type | 2.7B Causal Language Model |
Training Data | Subset of Stack Dedup v1.2 dataset, 525B tokens in total |
Training Platform | MosaicML with 256 x A100 - 40GB GPUs |
Evaluation Dataset | HumanEval |
pass@1 | 0.219 |
Model Hash | 5bc28ce32c6f9aec935ead7b60ea1c46 |
â ī¸ Important Note
The pre - training dataset may have contained offensive or inappropriate content even after applying data cleansing filters, and such content may be reflected in model generated text. We recommend that users exercise reasonable caution when using in production systems. Do not use for any applications that may cause harm or distress to individuals or groups.
đĄ Usage Tip Experiment with different decoding methods and parameters to get the best results for your use case. Also, perform post - processing on the generated code as recommended to ensure its quality.

