đ patched-coder-34b
An instruction fine-tuned model focused on code patching, including bug fixing, security vulnerability remediation, API migrations, and other code maintenance tasks.
đ Quick Start
Installation
Make sure to install Transformers from the main git branch:
pip install git+https://github.com/huggingface/transformers.git
How to Prompt the Model
This model accepts the alpaca instruction format.
For example:
### Instruction:
{instruction}
### Input:
{input}
### Response:
...
⨠Features
This is an instruction fine-tuned model focussed on the task of patching code. Patching may include fixing bugs, remediating security vulnerabilities,
doing API migrations and other kinds of code maintenance.
đĻ Installation
Make sure to install Transformers from the main git branch:
pip install git+https://github.com/huggingface/transformers.git
đģ Usage Examples
Basic Usage
This model accepts the alpaca instruction format.
### Instruction:
{instruction}
### Input:
{input}
### Response:
...
đ Documentation
Model Details
Model Description
Training Details
- GPU: A100 80 GB
- Time: ~8 hrs
Training Data
The model was fine-tuned on commitpackft, an open dataset consisting of commits.
We started with the commits for the python
langauge from the dataset and then filtered all the commits that were related to fixing bugs.
Training Procedure
Instruction fine-tuning to follow instructions in natural langauge related to code. We load the quantized base model in 4 bits
and then use QLoRA for Parameter-Efficient Fine-Tuning (PEFT) with Flash Attention. The model was trained for 2 epochs.
Training Hyperparameters
Training regime:
The following bitsandbytes
quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
Evaluation
We evaluated the model on HumanEval
(for code generation) and HumanEvalFix Python
(for bug fixing) benchmarks using
Code Generation LM Evaluation Harness.
To evaluate the model for vulnerability remediation we used the Static Analysis Eval
benchmark available here.
Results
Model |
HumanEval |
HumanEval Fix Python |
Static Analysis Eval |
patched-coder-34b |
53.57 |
41.34 |
51.32 |
CodeLlama-34b-Python |
53.29 |
33.14 |
27.63 |
GPT-4 |
86.6 |
47 |
55.26 |
Based on the results on these benchmarks, patched-coder-34b is the SOTA open code LLM. Other code LLMs (e.g. from WizardCoder and Phind) are trained on
either unknown proprietary datasets or used OpenAI's APIs for training, thus making them unviable for commercial use.
đ§ Technical Details
The model is an instruction fine-tuned version for code patching tasks. It uses QLoRA for Parameter-Efficient Fine-Tuning (PEFT) with Flash Attention during training, and loads the quantized base model in 4 bits. The training data is filtered from the commits in the python
language related to bug fixing from the commitpackft dataset.
đ License
The model is licensed under the llama2 license.
â ī¸ Important Note
This model has undergone very limited testing. Additional safety testing should be performed before any real-world deployments.