🚀 OctoCoder
OctoCoder is an instruction - tuned model that can handle various programming language tasks. It is created by finetuning StarCoder on specific datasets, aiming to provide high - quality code generation and instruction - following capabilities.
🚀 Quick Start
Intended use
The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
Feel free to share your generations in the Community tab!
Generation
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "bigcode/octocoder"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
✨ Features
- Multilingual Support: Capable of handling over 80 programming languages.
- Instruction - Following: Can accurately follow instructions provided in the input for code generation.
📦 Installation
Although not explicitly provided in the original, if you want to use the model for generation, you need to install the transformers
library:
pip install -q transformers
📚 Documentation
Model Summary
Property |
Details |
Model Type |
OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper. |
Training Data |
- CommitPack: 4TB of GitHub commits across 350 programming languages - CommitPackFT: Filtered version of CommitPack for high - quality commit messages that resemble instructions - OASST |
OctoPack🐙🎒
Data/Model/Evaluation |
Name |
Details |
Data |
CommitPack |
4TB of GitHub commits across 350 programming languages |
Data |
CommitPackFT |
Filtered version of CommitPack for high - quality commit messages that resemble instructions |
Model |
OctoCoder |
StarCoder (16B parameters) instruction tuned on CommitPackFT + OASST |
Model |
OctoGeeX |
CodeGeeX2 (6B parameters) instruction tuned on CommitPackFT + OASST |
Evaluation |
HumanEvalPack |
Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages |
Training
Model
- Architecture: GPT - 2 model with multi - query attention and Fill - in - the - Middle objective
- Steps: 250k pretraining & 30 instruction tuning
- Pretraining tokens: 1 trillion pretraining & 2M instruction tuning
- Precision: bfloat16
Hardware
- Pretraining:
- GPUs: 512 Tesla A100
- Training time: 24 days
- Instruction tuning:
- GPUs: 8 Tesla A100
- Training time: 4 hours
Software
- Orchestration: [Megatron - LM/Transformers](https://github.com/bigcode - project/octopack#training)
- Neural networks: PyTorch
Results
Task |
Dataset |
Metric |
Value |
Verified |
Text Generation |
bigcode/humanevalpack (HumanEvalSynthesize Python) |
pass@1 |
46.2 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalSynthesize JavaScript) |
pass@1 |
39.2 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalSynthesize Java) |
pass@1 |
38.2 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalSynthesize Go) |
pass@1 |
30.4 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalSynthesize C++) |
pass@1 |
35.6 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalSynthesize Rust) |
pass@1 |
23.4 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalSynthesize Average) |
pass@1 |
35.5 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalFix Python) |
pass@1 |
30.4 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalFix JavaScript) |
pass@1 |
28.4 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalFix Java) |
pass@1 |
30.6 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalFix Go) |
pass@1 |
30.2 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalFix C++) |
pass@1 |
26.1 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalFix Rust) |
pass@1 |
16.5 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalFix Average) |
pass@1 |
27.0 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalExplain Python) |
pass@1 |
35.1 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalExplain JavaScript) |
pass@1 |
24.5 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalExplain Java) |
pass@1 |
27.3 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalExplain Go) |
pass@1 |
21.1 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalExplain C++) |
pass@1 |
24.1 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalExplain Rust) |
pass@1 |
14.8 |
false |
Text Generation |
bigcode/humanevalpack (HumanEvalExplain Average) |
pass@1 |
24.5 |
false |
📄 License
The model is released under the bigcode - openrail - m
license.
📖 Citation
@article{muennighoff2023octopack,
title={OctoPack: Instruction Tuning Code Large Language Models},
author={Niklas Muennighoff and Qian Liu and Armel Zebaze and Qinkai Zheng and Binyuan Hui and Terry Yue Zhuo and Swayam Singh and Xiangru Tang and Leandro von Werra and Shayne Longpre},
journal={arXiv preprint arXiv:2308.07124},
year={2023}
}