๐ WizardCoder and Related Models
This repository focuses on a series of models including WizardCoder, WizardMath, and WizardLM, aiming to provide high - performance language models for various tasks such as code generation and math problem - solving.
๐ Documentation
Note
This is a replica of the official repository, intended solely for research purposes to replicate results. If there are any copyright issues, please contact me.
๐ค HF Repo โข๐ฑ Github Repo โข ๐ฆ Twitter โข ๐ [WizardLM] โข ๐ [WizardCoder] โข ๐ [WizardMath]
๐ Join our Discord
News
- ๐ฅ๐ฅ๐ฅ[2023/08/26] We released WizardCoder - Python - 34B - V1.0, which achieves the 73.2 pass@1 and surpasses GPT4 (2023/03/15), ChatGPT - 3.5, and Claude2 on the HumanEval Benchmarks.
- [2023/06/16] We released WizardCoder - 15B - V1.0, which achieves the 57.3 pass@1 and surpasses Claude - Plus (+6.8), Bard (+15.3) and InstructCodeT5+ (+22.3) on the HumanEval Benchmarks.
โNote: There are two HumanEval results of GPT4 and ChatGPT - 3.5. The 67.0 and 48.1 are reported by the official GPT4 Report (2023/03/15) of OpenAI. The 82.0 and 72.5 are tested by ourselves with the latest API (2023/08/26).
Property |
Details |
Model Type |
WizardCoder - Python - 34B - V1.0, WizardCoder - 15B - V1.0, etc. |
Training Data |
Not specified |
Metrics |
code_eval |
Library Name |
transformers |
License |
llama2 (for some models), OpenRAIL - M (for some models) |
- Our WizardMath - 70B - V1.0 model slightly outperforms some closed - source LLMs on the GSM8K, including ChatGPT 3.5, Claude Instant 1 and PaLM 2 540B.
- Our WizardMath - 70B - V1.0 model achieves 81.6 pass@1 on the [GSM8k Benchmarks](https://github.com/openai/grade - school - math), which is 24.8 points higher than the SOTA open - source LLM, and achieves 22.7 pass@1 on the MATH Benchmarks, which is 9.2 points higher than the SOTA open - source LLM.
- [08/09/2023] We released WizardLM - 70B - V1.0 model. Here is [Full Model Weight](https://huggingface.co/WizardLM/WizardLM - 70B - V1.0).
Model |
Checkpoint |
Paper |
MT - Bench |
AlpacaEval |
GSM8k |
HumanEval |
License |
WizardLM - 70B - V1.0 |
๐ค HF Link |
๐Coming Soon |
7.78 |
92.91% |
77.6% |
50.6 |
Llama 2 License |
WizardLM - 13B - V1.2 |
๐ค HF Link |
|
7.06 |
89.17% |
55.3% |
36.6 |
Llama 2 License |
WizardLM - 13B - V1.1 |
๐ค HF Link |
|
6.76 |
86.32% |
|
25.0 |
Non - commercial |
WizardLM - 30B - V1.0 |
๐ค HF Link |
|
7.01 |
|
|
37.8 |
Non - commercial |
WizardLM - 13B - V1.0 |
๐ค HF Link |
|
6.35 |
75.31% |
|
24.0 |
Non - commercial |
WizardLM - 7B - V1.0 |
๐ค HF Link |
๐ [WizardLM] |
|
|
|
19.1 |
Non - commercial |
Comparing WizardCoder - Python - 34B - V1.0 with Other LLMs
๐ฅ The following figure shows that our WizardCoder - Python - 34B - V1.0 attains the second position in this benchmark, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT - 3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2).
Prompt Format
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
Inference Demo Script
We provide the inference demo code here.
Citation
Please cite the repo if you use the data, method or code in this repo.
@article{luo2023wizardcoder,
title={WizardCoder: Empowering Code Large Language Models with Evol - Instruct},
author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin},
journal={arXiv preprint arXiv:2306.08568},
year={2023}
}