WizardCoder-33B-V1.1開源代碼大模型 - 免費部署，基準測試表現優異的實用之選

首頁

Wizardcoder 33B V1.1

由WizardLMTeam開發

WizardCoder-33B-V1.1是基於deepseek-coder-33b-base訓練的開源代碼大語言模型，在HumanEval和MBPP等基準測試上表現優異，成為SOTA開源代碼LLM。

大型語言模型

Transformers

其他#代碼生成SOTA #編程任務優化 #多語言代碼支持

下載量 293

發布時間 : 1/4/2024

模型概述

WizardCoder是一個通過Evol-Instruct方法增強的代碼生成大語言模型，專注於代碼生成和編程任務。

模型特點

高性能代碼生成

在HumanEval上達到79.9 pass@1，超越ChatGPT 3.5和Gemini Pro

Evol-Instruct訓練方法

採用創新的Evol-Instruct方法增強模型能力

開源SOTA

當前開源代碼LLM中的最先進模型

模型能力

代碼自動補全

代碼生成

編程問題解答

代碼解釋

代碼重構

使用案例

軟件開發

自動化代碼生成

根據自然語言描述生成可運行的代碼

在HumanEval基準上達到79.9%的通過率

編程教育

幫助學生理解和學習編程概念

技術面試準備

編程題解答

生成編程面試題的解決方案

🚀 WizardCoder：通過Evol - Instruct賦能代碼大語言模型

WizardCoder是一款專注於代碼生成的大語言模型，藉助Evol - Instruct技術提升性能。它在多個代碼評估數據集上表現出色，為代碼生成領域帶來了新的解決方案。

關鍵信息

屬性	詳情
模型類型	WizardCoder
評估指標	code_eval
訓練數據處理	對Code - Alpaca數據應用Code Evol - Instruct
測試數據集	openai_humaneval（HumanEval）
pass@1指標值	0.799

項目鏈接

📢 最新消息

[2024/01/04] 🔥 我們發佈了基於deepseek - coder - 33b - base訓練的WizardCoder - 33B - V1.1，它是EvalPlus排行榜上的最優開源代碼大語言模型，在HumanEval上的pass@1達到0.799，在HumanEval - Plus上為0.732，在MBPP上為0.789，在MBPP - Plus上為0.669。
[2024/01/04] 🔥 WizardCoder - 33B - V1.1在HumanEval和HumanEval - Plus的pass@1指標上超過了ChatGPT 3.5、Gemini Pro和DeepSeek - Coder - 33B - instruct。
[2024/01/04] 🔥 WizardCoder - 33B - V1.1在MBPP和MBPP - Plus的pass@1指標上與ChatGPT 3.5相當，超過了Gemini Pro。

模型性能對比

模型	檢查點	論文	HumanEval	HumanEval+	MBPP	MBPP+	許可證
GPT - 4 - Turbo (Nov 2023)	-	-	85.4	81.7	83.0	70.7	-
GPT - 4 (May 2023)	-	-	88.4	76.8	-	-	-
GPT - 3.5 - Turbo (Nov 2023)	-	-	72.6	65.9	81.7	69.4	-
Gemini Pro	-	-	63.4	55.5	72.9	57.9	-
DeepSeek - Coder - 33B - instruct	-	-	78.7	72.6	78.7	66.7	-
WizardCoder - 33B - V1.1	🤗 HF鏈接	📃 WizardCoder	79.9	73.2	78.9	66.9	MSFTResearch
WizardCoder - Python - 34B - V1.0	🤗 HF鏈接	📃 WizardCoder	73.2	64.6	73.2	59.9	Llama2
WizardCoder - 15B - V1.0	🤗 HF鏈接	📃 WizardCoder	59.8	52.4	--	--	OpenRAIL - M
WizardCoder - Python - 13B - V1.0	🤗 HF鏈接	📃 WizardCoder	64.0	--	--	--	Llama2
WizardCoder - Python - 7B - V1.0	🤗 HF鏈接	📃 WizardCoder	55.5	--	--	--	Llama2
WizardCoder - 3B - V1.0	🤗 HF鏈接	📃 WizardCoder	34.8	--	--	--	OpenRAIL - M
WizardCoder - 1B - V1.0	🤗 HF鏈接	📃 WizardCoder	23.8	--	--	--	OpenRAIL - M

📦 訓練數據製作

對Code - Alpaca數據應用我們的Code Evol - Instruct。

❗ 數據汙染檢查

在模型訓練前，我們仔細嚴格地檢查了所有訓練數據，並使用多種去重方法來驗證和防止在HumanEval和MBPP測試集上的數據洩露。

⚠️ 重要提示

請嚴格使用與我們相同的系統提示，我們不保證量化版本的準確性。

默認版本：

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

💻 復現WizardCoder - 33B - V1.1的性能

依賴環境

transformers==4.36.2
vllm==0.2.5

代碼和結果

我們在這裡提供了所有代碼。
我們也在這裡提供了所有生成的結果。

（1）HumanEval和HumanEval - Plus

步驟1：代碼生成（無加速）

model="WizardLM/WizardCoder-33B-V1.1"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/T${temp}_N${pred_num}_WizardCoder-33B-V1.1_Greedy_Decode

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

# 164 problems, 21 per GPU if GPU=8
index=0
gpu_num=8
for ((i = 0; i < $gpu_num; i++)); do
  start_index=$((i * 21))
  end_index=$(((i + 1) * 21))

  gpu=$((i))
  echo 'Running process #' ${i} 'from' $start_index 'to' $end_index 'on GPU' ${gpu}
  ((index++))
  (
    CUDA_VISIBLE_DEVICES=$gpu python humaneval_gen.py --model ${model} \
      --start_index ${start_index} --end_index ${end_index} --temperature ${temp} \
      --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --greedy_decode
  ) &
  if (($index % $gpu_num == 0)); then wait; fi
done

步驟1：代碼生成（使用vllm加速）

model="WizardLM/WizardCoder-33B-V1.1"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/T${temp}_N${pred_num}_WizardCoder-33B-V1.1_Greedy_Decode_vllm

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

CUDA_VISIBLE_DEVICES=0,1,2,3 python humaneval_gen_vllm.py --model ${model} \
    --start_index 0 --end_index 164 --temperature ${temp} \
    --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --num_gpus 4 --overwrite

步驟2：獲取分數

安裝Eval - Plus基準測試：

git clone https://github.com/evalplus/evalplus.git
cd evalplus
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt

獲取HumanEval和HumanEval - Plus分數：

output_path=preds/T0.0_N1_WizardCoder-33B-V1.1_Greedy_Decode

echo 'Output path: '$output_path
python process_humaneval.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt

evalplus.evaluate --dataset humaneval --samples ${output_path}.jsonl

（2）MBPP和MBPP - Plus

預處理後的問題在mbppplus.json中提供。

步驟1：代碼生成（無加速）

model="WizardLM/WizardCoder-33B-V1.1"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/MBPP_T${temp}_N${pred_num}_WizardCoder-33B-V1.1_Greedy_Decode

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

# 399 problems, 50 per GPU if GPU=8
index=0
gpu_num=8
for ((i = 0; i < $gpu_num; i++)); do
  start_index=$((i * 50))
  end_index=$(((i + 1) * 50))

  gpu=$((i))
  echo 'Running process #' ${i} 'from' $start_index 'to' $end_index 'on GPU' ${gpu}
  ((index++))
  (
    CUDA_VISIBLE_DEVICES=$gpu python mbppplus_gen.py --model ${model} \
      --start_index ${start_index} --end_index ${end_index} --temperature ${temp} \
      --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --mbpp_path "mbppplus.json" --greedy_decode
  ) &
  if (($index % $gpu_num == 0)); then wait; fi
done

步驟1：代碼生成（使用vllm加速）

model="WizardLM/WizardCoder-33B-V1.1"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/MBPP_T${temp}_N${pred_num}_WizardCoder-33B-V1.1_Greedy_Decode_vllm

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

CUDA_VISIBLE_DEVICES=0,1,2,3 python mbppplus_gen_vllm.py --model ${model} \
    --start_index ${start_index} --end_index ${end_index} --temperature ${temp} \
    --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --mbpp_path "mbppplus.json" --num_gpus 4

步驟2：獲取分數

安裝Eval - Plus基準測試：

git clone https://github.com/evalplus/evalplus.git
cd evalplus
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt

獲取MBPP和MBPP - Plus分數：

output_path=preds/MBPP_T0.0_N1_WizardCoder-33B-V1.1_Greedy_Decode

echo 'Output path: '$output_path
python mbppplus_process_preds.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt

evalplus.evaluate --dataset mbpp --samples ${output_path}.jsonl

📄 引用

如果您使用了本倉庫中的數據、方法或代碼，請引用該倉庫：

@article{luo2023wizardcoder,
  title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
  author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin},
  journal={arXiv preprint arXiv:2306.08568},
  year={2023}
}