Refact-1_6B-fim Open Source Code Generation Model - Free Deployment, Practical for Multi-language Code Completion and Chatting

Refact 1 6B Fim

Developed by refactai

Refact-1.6B is a model designed for code generation and completion, supporting multiple languages and chat functions, with performance approaching that of larger-scale models.

Large Language Model

Transformers

Open Source License:Openrail #High-performance code completion #Multi-language programming support #ALiBi attention mechanism

Downloads 10.06k

Release Time : 8/29/2023

Model Overview

Refact-1.6B is a high-performance code generation and completion model that outperforms similar models on multiple metrics through fine-tuning. It supports multiple programming languages and can be used as a chatbot.

Model Features

High-performance code completion

Performs excellently in code completion tasks, outperforming models such as Replit 3b and Stability Code 3b, and approaching the performance of StarCoder.

Multi-language support

Supports multiple programming languages and demonstrates good performance in multi-language evaluations such as MultiPL-HumanEval.

Chat function

Can be used as a chatbot and performs excellently in the HumanEval results in the instruction-following (chat) format.

Fill-in-the-middle

Uses special markers `<fim_prefix>`, `<fim_suffix>`, and `<fim_middle>` to identify the prefix, middle, and suffix parts of the input and output.

Model Capabilities

Code generation

Code completion

Multi-language programming support

Chatbot

Use Cases

Programming assistance

Code completion

Automatically completes code snippets when writing code, improving development efficiency.

Achieves 32.0% in HumanEval pass@1 and 53.0% in pass@10.

Code generation

Generates complete code functions or modules according to user requirements.

Performs well in multi-language evaluations such as MultiPL-HumanEval.

Programming Q&A

Answers programming-related questions as a chatbot and provides solutions.

Achieves 38.4% in HumanEval pass@1 and 55.6% in pass@10 in chat mode.

🚀 Refact-1.6B

The Refact-1.6B model, trained with recent innovations, offers high - performance code completion and chat capabilities, outperforming many larger models.

image/png

🚀 Quick Start

Finally, the model we started training with our blog post is ready 🎉

After fine - tuning on generated data, it beats Replit 3b, Stability Code 3b and many other models. It almost beats StarCoder ten times the size!

You can start using it right now by downloading the Refact plugin. You can host the model yourself, too, using the open source docker container.

✨ Features

High - performance Code Completion: Outperforms many larger models in code completion tasks, as shown in the HumanEval pass@1 and pass@10 metrics.
Multi - language Support: Works well in multiple programming languages, as indicated by the MultiPL - HumanEval and other metrics.
Chat Functionality: Can be used in a chat format, although it's experimental.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Fill - in - the - middle uses special tokens to identify the prefix/middle/suffix part of the input and output:

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "smallcloudai/Refact-1_6B-fim"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)

prompt = '<fim_prefix>def print_hello_world():\n    """<fim_suffix>\n    print("Hello world!")<fim_middle>'

inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_length=100, temperature=0.2)
print("-"*80)
print(tokenizer.decode(outputs[0]))

Advanced Usage

The same model works as chat (experimental).

prompt_template = "<empty_output>SYSTEM {system}\n" \
                  "<empty_output>USER {query}\n" \
                  "<empty_output>ASSISTANT"
prompt = prompt_template.format(system="You are a programming assistant",
                                query="How do I sort a list in Python?")

📚 Documentation

Model Comparison

Model	Size	HumanEval pass@1	HumanEval pass@10
DeciCoder - 1b	1b	19.1%
Refact - 1.6 - fim	1.6b	32.0%	53.0%
StableCode	3b	20.2%	33.8%
ReplitCode v1	3b	21.9%
CodeGen2.5 - multi	7b	28.4%	47.5%
CodeLlama	7b	33.5%	59.6%
StarCoder	15b	33.6%

Chat Performance Comparison

Model	Size	pass@1	pass@10
Refact - 1.6 - fim	1.6b	38.4%	55.6%
StableCode - instruct	3b	26.9%	36.2%
OctoGeeX	6b	44.7%
CodeLlama - instruct	7b	34.8%	64.3%
CodeGen2.5 - instruct	7b	36.2%	60.87
CodeLlama - instruct	13b	42.7%	71.6%
StarChat - β	15b	33.5%
OctoCoder	15b	46.2%

🔧 Technical Details

Architecture

As described in more detail in the blog post, we used:

ALiBi based attention
LayerNorm instead of RMSNorm
Multi Query Attention

We also used LiON, flash attention, early dropout.

Pretraining

For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets. Filtering is the key to success of this model:

We only used text in English
Only topics related to computer science
Applied heavy deduplication

The text to code proportion was 50:50, model trained for 1.2T tokens.

Finetuning

We tested our hypothesis that chat data should boost base model performance in FIM and regular left - to - right code completion. We found that just 15% of open code instruction - following datasets, that we filtered for quality, improves almost all metrics.

The rest 85% of the finetune dataset was used to address the distribution shift between typical code on the internet and the code you write in your IDE. The best attempt took 40B tokens.

Model Stats

Property	Details
Architecture	LLAMA - like model with multi - query attention
Objectives	Fill - in - the - Middle, Chat
Tokens context	4096
Pretraining tokens	1.2T
Finetuning tokens	40B
Precision	bfloat16
GPUs	64 NVidia A5000
Training time	28 days

📄 License

The model is licensed under the BigScience OpenRAIL - M v1 license agreement

Limitations and Bias

The Refact - 1.6B model was trained on text in English. But it has seen a lot more languages in code comments. Its performance on non - English languages is lower, for sure.

Citation

If you are using this model, please give a link to this page.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご