WizardCoder-33B-V1.1開源代碼大模型 - 免費可用實現優異代碼生成效果

首頁

Wizardcoder 15B V1.0

由WizardLMTeam開發

WizardCoder-33B-V1.1是基於deepseek-coder-33b-base訓練的開源代碼大語言模型，在HumanEval等代碼生成基準測試中表現優異。

大型語言模型

Transformers

其他開源協議:Openrail #代碼生成 #編程輔助 #大語言模型

下載量 521

發布時間 : 6/14/2023

模型概述

WizardCoder系列模型專注於代碼生成和理解任務，通過Evol-Instruct方法增強模型能力，在多個代碼生成基準測試中達到SOTA水平。

模型特點

高性能代碼生成

在HumanEval、MBPP等代碼生成基準測試中達到開源模型SOTA水平

Evol-Instruct方法

採用創新的Evol-Instruct訓練方法增強模型代碼理解和生成能力

多基準測試領先

在HumanEval、HumanEval-Plus、MBPP、MBPP-Plus等多個基準測試中表現優異

模型能力

代碼生成

代碼補全

代碼理解

編程問題解答

使用案例

軟件開發

自動化代碼生成

根據自然語言描述生成可運行的代碼

在HumanEval基準測試中達到79.9 pass@1

編程教育

幫助學生理解和學習編程概念

技術面試

編程題解答

解決技術面試中的編程問題

🚀 WizardCoder：通過Evol - Instruct賦能代碼大語言模型

WizardCoder是專門針對編碼任務開發的代碼大語言模型。它藉助Evol - Instruct方法，為代碼相關指令定製提示，對Code LLM（如StarCoder）進行微調，在多個編碼基準測試中表現出色，超越了眾多開源和閉源模型。

🚀 快速開始

若你想快速體驗WizardCoder的能力，可通過以下鏈接訪問在線演示：

WizardCoder在線演示

若你想進行模型微調、推理或評估，請繼續閱讀後續章節。

✨ 主要特性

高性能表現

代碼生成能力強：在[HumanEval基準測試](https://github.com/openai/human - eval)中，WizardCoder - 15B - v1.0模型達到了57.3 pass@1，比當前最優的開源代碼大語言模型高出22.3分。
超越閉源模型：在某些基準測試中，WizardCoder排名第三，超越了Claude - Plus和Bard，且模型規模遠小於這些閉源模型。

定製化訓練

Evol - Instruct方法：針對編碼任務對Evol - Instruct方法進行調整，定製代碼相關指令提示，使用新創建的指令跟隨訓練集對模型進行微調。

多模型支持

除了代碼生成模型WizardCoder，還發布了數學推理模型WizardMath，在數學相關基準測試中也有出色表現。

📦 安裝指南

微調環境安裝

若你想對WizardCoder進行微調，需按照以下步驟安裝環境：

根據[Llama - X](https://github.com/AetherCortex/Llama - X)的說明，安裝環境、下載訓練代碼並部署。注意：需安裝deepspeed==0.9.2和transformers==4.29.2。
用我們倉庫中的train_wizardcoder.py（src/train_wizardcoder.py）替換train.py。
登錄Huggingface：

huggingface-cli login

推理環境安裝

若你想進行推理，需安裝jsonlines：

pip install jsonlines

評估環境安裝

若你想對模型進行評估，需根據[HumanEval](https://github.com/openai/human - eval)的說明安裝環境。

💻 使用示例

微調示例

以下是微調WizardCoder的訓練命令：

deepspeed train_wizardcoder.py \
    --model_name_or_path "bigcode/starcoder" \
    --data_path "/your/path/to/code_instruction_data.json" \
    --output_dir "/your/path/to/ckpt" \
    --num_train_epochs 3 \
    --model_max_length 2048 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50 \
    --save_total_limit 2 \
    --learning_rate 2e-5 \
    --warmup_steps 30 \
    --logging_steps 2 \
    --lr_scheduler_type "cosine" \
    --report_to "tensorboard" \
    --gradient_checkpointing True \
    --deepspeed configs/deepspeed_config.json \
    --fp16 True

推理示例

以下是推理的命令：

python src\inference_wizardcoder.py \
    --base_model "/your/path/to/ckpt" \
    --input_data_path "/your/path/to/input/data.jsonl" \
    --output_data_path "/your/path/to/output/result.jsonl"

data.jsonl的格式如下：

{"idx": 11, "Instruction": "Write a Python code to count 1 to 10."}
{"idx": 12, "Instruction": "Write a Jave code to sum 1 to 10."}

評估示例

以下是在HumanEval上進行評估的步驟：

生成答案：

model="/path/to/your/model"
temp=0.2
max_len=2048
pred_num=200
num_seqs_per_iter=2

output_path=preds/T${temp}_N${pred_num}

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

# 164 problems, 21 per GPU if GPU=8
index=0
gpu_num=8
for ((i = 0; i < $gpu_num; i++)); do
  start_index=$((i * 21))
  end_index=$(((i + 1) * 21))

  gpu=$((i))
  echo 'Running process #' ${i} 'from' $start_index 'to' $end_index 'on GPU' ${gpu}
  ((index++))
  (
    CUDA_VISIBLE_DEVICES=$gpu python humaneval_gen.py --model ${model} \
      --start_index ${start_index} --end_index ${end_index} --temperature ${temp} \
      --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path}
  ) &
  if (($index % $gpu_num == 0)); then wait; fi
done

運行後處理代碼src/process_humaneval.py收集所有答案文件中的代碼補全：

output_path=preds/T${temp}_N${pred_num}

echo 'Output path: '$output_path
python process_humaneval.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt

evaluate_functional_correctness ${output_path}.jsonl

📚 詳細文檔

模型信息

屬性	詳情
模型類型	代碼大語言模型
訓練數據	78k進化後的代碼指令

模型對比

WizardCoder與閉源模型對比

WizardCoder在基準測試中排名第三，超越了Claude - Plus（59.8 vs. 53.0）和Bard（59.8 vs. 44.5），且模型規模遠小於這些模型。 WizardCoder與閉源模型對比

WizardCoder與開源模型對比

模型	HumanEval Pass@1	MBPP Pass@1
CodeGen - 16B - Multi	18.3	20.9
CodeGeeX	22.9	24.4
LLaMA - 33B	21.7	30.2
LLaMA - 65B	23.7	37.7
PaLM - 540B	26.2	36.8
PaLM - Coder - 540B	36.0	47.0
PaLM 2 - S	37.6	50.0
CodeGen - 16B - Mono	29.3	35.3
Code - Cushman - 001	33.5	45.9
StarCoder - 15B	33.6	43.6*
InstructCodeT5+	35.0	--
WizardLM - 30B 1.0	37.8	--
WizardCoder - 15B 1.0	57.3	51.8

注：StarCoder在MBPP上的結果為復現結果。上述表格對WizardCoder與其他模型在HumanEval和MBPP基準測試上進行了全面比較，為每個問題生成20個樣本以估計pass@1分數，並使用相同的[代碼](https://github.com/openai/human - eval/tree/master)進行評估。

新聞動態

[2024/01/04] 發佈WizardCoder - 33B - V1.1，基於deepseek - coder - 33b - base訓練，在EvalPlus排行榜上是最優的開源代碼大語言模型，在HumanEval上達到79.9 pass@1，在HumanEval - Plus上達到73.2 pass@1，在MBPP上達到78.9 pass@1，在MBPP - Plus上達到66.9 pass@1。
[2024/01/04] WizardCoder - 33B - V1.1在HumanEval和HumanEval - Plus pass@1上超越了ChatGPT 3.5、Gemini Pro和DeepSeek - Coder - 33B - instruct。
[2024/01/04] WizardCoder - 33B - V1.1在MBPP和MBPP - Plus pass@1上與ChatGPT 3.5相當，超越了Gemini Pro。
[08/11/2023] 發佈WizardMath模型。
WizardMath - 70B - V1.0模型在GSM8K基準測試上略優於一些閉源大語言模型，包括ChatGPT 3.5、Claude Instant 1和PaLM 2 540B。
WizardMath - 70B - V1.0模型在[GSM8k基準測試](https://github.com/openai/grade - school - math)上達到81.6 pass@1，比當前最優的開源大語言模型高24.8分。
WizardMath - 70B - V1.0模型在MATH基準測試上達到22.7 pass@1，比當前最優的開源大語言模型高9.2分。

模型指標

模型	檢查點	論文	HumanEval	HumanEval+	MBPP	MBPP+	許可證
GPT - 4 - Turbo (Nov 2023)	-	-	85.4	81.7	83.0	70.7	-
GPT - 4 (May 2023)	-	-	88.4	76.8	-	-	-
GPT - 3.5 - Turbo (Nov 2023)	-	-	72.6	65.9	81.7	69.4	-
Gemini Pro	-	-	63.4	55.5	72.9	57.9	-
DeepSeek - Coder - 33B - instruct	-	-	78.7	72.6	78.7	66.7	-
WizardCoder - 33B - V1.1	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardCoder - 33B - V1.1)	📃 WizardCoder	79.9	73.2	78.9	66.9	[MSFTResearch](https://huggingface.co/WizardLM/WizardMath - 7B - V1.1/resolve/main/LICENSE)
WizardCoder - Python - 34B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardCoder - Python - 34B - V1.0)	📃 WizardCoder	73.2	64.6	73.2	59.9	[Llama2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardCoder - 15B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardCoder - 15B - V1.0)	📃 WizardCoder	59.8	52.4	--	--	[OpenRAIL - M](https://huggingface.co/spaces/bigcode/bigcode - model - license - agreement)
WizardCoder - Python - 13B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardCoder - Python - 13B - V1.0)	📃 WizardCoder	64.0	--	--	--	[Llama2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardCoder - Python - 7B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardCoder - Python - 7B - V1.0)	📃 WizardCoder	55.5	--	--	--	[Llama2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardCoder - 3B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardCoder - 3B - V1.0)	📃 WizardCoder	34.8	--	--	--	[OpenRAIL - M](https://huggingface.co/spaces/bigcode/bigcode - model - license - agreement)
WizardCoder - 1B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardCoder - 1B - V1.0)	📃 WizardCoder	23.8	--	--	--	[OpenRAIL - M](https://huggingface.co/spaces/bigcode/bigcode - model - license - agreement)

其他模型指標

模型	檢查點	論文	GSM8k	MATH	在線演示	許可證
WizardMath - 70B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardMath - 70B - V1.0)	📃 WizardMath	81.6	22.7	演示	[Llama 2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardMath - 13B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardMath - 13B - V1.0)	📃 WizardMath	63.9	14.0	演示	[Llama 2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardMath - 7B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardMath - 7B - V1.0)	📃 WizardMath	54.9	10.7	演示	[Llama 2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)

模型	檢查點	論文	MT - Bench	AlpacaEval	WizardEval	HumanEval	許可證
WizardLM - 13B - V1.2	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardLM - 13B - V1.2)		7.06	89.17%	101.4%	36.6 pass@1	[Llama 2 License](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardLM - 13B - V1.1	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardLM - 13B - V1.1)		6.76	86.32%	99.3%	25.0 pass@1	非商業用途
WizardLM - 30B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardLM - 30B - V1.0)		7.01		97.8%	37.8 pass@1	非商業用途
WizardLM - 13B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardLM - 13B - V1.0)		6.35	75.31%	89.1%	24.0 pass@1	非商業用途
WizardLM - 7B - V1.0	🤗 [HF鏈接](https://huggingface.co/WizardLM/WizardLM - 7B - V1.0)	📃 WizardLM			78.0%	19.1 pass@1	非商業用途

🔧 技術細節

為了開發WizardCoder模型，首先針對編碼任務對Evol - Instruct方法進行調整，將提示定製為與代碼相關的指令。然後，使用新創建的指令跟隨訓練集對代碼大語言模型StarCoder進行微調。

📄 許可證

WizardCoder模型遵循與StarCoder相同的許可證。任何版本的WizardCoder生成的內容都會受到隨機性等不可控變量的影響，因此本項目無法保證輸出的準確性。本項目不承擔模型輸出內容的任何法律責任，也不對因使用相關資源和輸出結果而造成的任何損失負責。

引用

若你使用了本倉庫中的數據、方法或代碼，請引用以下論文：

@article{luo2023wizardcoder,
  title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
  author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin},
  journal={arXiv preprint arXiv:2306.08568},
  year={2023}
}

反饋徵集

歡迎大家使用專業且具有挑戰性的指令對WizardCoder進行評估，並在問題討論區向我們展示模型表現不佳的示例和建議。我們目前專注於改進Evol - Instruct方法，希望在WizardCoder的下一個版本中解決現有問題。之後，我們將開源最新的Evol - Instruct算法代碼和流程，並與大家一起改進它。