đ Refact-1.6B
The Refact-1.6B model, trained with recent innovations, offers high - performance code completion and chat capabilities, outperforming many larger models.

đ Quick Start
Finally, the model we started training with our blog post is ready đ
After fine - tuning on generated data, it beats Replit 3b, Stability Code 3b and many other models. It almost beats StarCoder ten times the size!
You can start using it right now by downloading the Refact plugin. You can host the model yourself, too, using the open source docker container.
⨠Features
- High - performance Code Completion: Outperforms many larger models in code completion tasks, as shown in the HumanEval pass@1 and pass@10 metrics.
- Multi - language Support: Works well in multiple programming languages, as indicated by the MultiPL - HumanEval and other metrics.
- Chat Functionality: Can be used in a chat format, although it's experimental.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
Fill - in - the - middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "smallcloudai/Refact-1_6B-fim"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_length=100, temperature=0.2)
print("-"*80)
print(tokenizer.decode(outputs[0]))
Advanced Usage
The same model works as chat (experimental).
prompt_template = "<empty_output>SYSTEM {system}\n" \
"<empty_output>USER {query}\n" \
"<empty_output>ASSISTANT"
prompt = prompt_template.format(system="You are a programming assistant",
query="How do I sort a list in Python?")
đ Documentation
Model Comparison
Model |
Size |
HumanEval pass@1 |
HumanEval pass@10 |
DeciCoder - 1b |
1b |
19.1% |
|
Refact - 1.6 - fim |
1.6b |
32.0% |
53.0% |
StableCode |
3b |
20.2% |
33.8% |
ReplitCode v1 |
3b |
21.9% |
|
CodeGen2.5 - multi |
7b |
28.4% |
47.5% |
CodeLlama |
7b |
33.5% |
59.6% |
StarCoder |
15b |
33.6% |
|
Chat Performance Comparison
Model |
Size |
pass@1 |
pass@10 |
Refact - 1.6 - fim |
1.6b |
38.4% |
55.6% |
StableCode - instruct |
3b |
26.9% |
36.2% |
OctoGeeX |
6b |
44.7% |
|
CodeLlama - instruct |
7b |
34.8% |
64.3% |
CodeGen2.5 - instruct |
7b |
36.2% |
60.87 |
CodeLlama - instruct |
13b |
42.7% |
71.6% |
StarChat - β |
15b |
33.5% |
|
OctoCoder |
15b |
46.2% |
|
đ§ Technical Details
Architecture
As described in more detail in the blog post, we used:
We also used LiON, flash attention, early dropout.
Pretraining
For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets. Filtering is the key to success of this model:
- We only used text in English
- Only topics related to computer science
- Applied heavy deduplication
The text to code proportion was 50:50, model trained for 1.2T tokens.
Finetuning
We tested our hypothesis that chat data should boost base model performance in FIM and regular left - to - right code completion. We found that just 15% of open code instruction - following datasets, that we filtered for quality, improves almost all metrics.
The rest 85% of the finetune dataset was used to address the distribution shift between typical code on the internet and the code you write in your IDE. The best attempt took 40B tokens.
Model Stats
Property |
Details |
Architecture |
LLAMA - like model with multi - query attention |
Objectives |
Fill - in - the - Middle, Chat |
Tokens context |
4096 |
Pretraining tokens |
1.2T |
Finetuning tokens |
40B |
Precision |
bfloat16 |
GPUs |
64 NVidia A5000 |
Training time |
28 days |
đ License
The model is licensed under the BigScience OpenRAIL - M v1 license agreement
Limitations and Bias
The Refact - 1.6B model was trained on text in English. But it has seen a lot more languages in code comments. Its performance on non - English languages is lower, for sure.
Citation
If you are using this model, please give a link to this page.