đ WizardCoder: Empowering Code Large Language Models with Evol-Instruct
WizardCoder is a project that aims to empower code large language models with Evol-Instruct, providing high - performance code generation capabilities and achieving excellent results on multiple benchmarks.
Metadata
Property |
Details |
License |
Llama2 |
Metrics |
code_eval |
Library Name |
transformers |
Tags |
code |
Model Name |
WizardCoder - Python - 34B - V1.0 |
Task Type |
text - generation |
Dataset |
openai_humaneval (HumanEval) |
Metric (pass@1) |
0.732 |
Verification Status |
false |
đ Useful Links
đĸ News
- [2024/01/04] đĨ We released WizardCoder - 33B - V1.1 trained from deepseek - coder - 33b - base, the SOTA OSS Code LLM on EvalPlus Leaderboard, achieving 79.9 pass@1 on HumanEval, 73.2 pass@1 on HumanEval - Plus, 78.9 pass@1 on MBPP, and 66.9 pass@1 on MBPP - Plus.
- [2024/01/04] đĨ WizardCoder - 33B - V1.1 outperforms ChatGPT 3.5, Gemini Pro, and DeepSeek - Coder - 33B - instruct on HumanEval and HumanEval - Plus pass@1.
- [2024/01/04] đĨ WizardCoder - 33B - V1.1 is comparable with ChatGPT 3.5, and surpasses Gemini Pro on MBPP and MBPP - Plus pass@1.
Code Generation Model Comparison
Math - related Model Comparison
- Our WizardMath - 70B - V1.0 model slightly outperforms some closed - source LLMs on the GSM8K, including ChatGPT 3.5, Claude Instant 1 and PaLM 2 540B.
- Our WizardMath - 70B - V1.0 model achieves 81.6 pass@1 on the [GSM8k Benchmarks](https://github.com/openai/grade - school - math), which is 24.8 points higher than the SOTA open - source LLM, and achieves 22.7 pass@1 on the MATH Benchmarks, which is 9.2 points higher than the SOTA open - source LLM.
General LLM Comparison
Model |
Checkpoint |
Paper |
MT - Bench |
AlpacaEval |
GSM8k |
HumanEval |
License |
WizardLM - 70B - V1.0 |
đ¤ HF Link |
đ Coming Soon |
7.78 |
92.91% |
77.6% |
50.6 |
Llama 2 License |
WizardLM - 13B - V1.2 |
đ¤ HF Link |
|
7.06 |
89.17% |
55.3% |
36.6 |
Llama 2 License |
WizardLM - 13B - V1.1 |
đ¤ HF Link |
|
6.76 |
86.32% |
|
25.0 |
Non - commercial |
WizardLM - 30B - V1.0 |
đ¤ HF Link |
|
7.01 |
|
|
37.8 |
Non - commercial |
WizardLM - 13B - V1.0 |
đ¤ HF Link |
|
6.35 |
75.31% |
|
24.0 |
Non - commercial |
WizardLM - 7B - V1.0 |
đ¤ HF Link |
đ WizardLM |
|
|
|
19.1 |
Non - commercial |
đĒ Comparing WizardCoder - Python - 34B - V1.0 with Other LLMs
đĨ The following figure shows that our WizardCoder - Python - 34B - V1.0 attains the second position in this benchmark, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT - 3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2).
đ Prompt Format
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
đ Inference Demo Script
We provide the inference demo code here.
đ Citation
Please cite the repo if you use the data, method or code in this repo.
@article{luo2023wizardcoder,
title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin},
journal={arXiv preprint arXiv:2306.08568},
year={2023}
}