WizardCoder-33B-V1.1开源代码大模型 - 免费可用实现优异代码生成效果

首页

Wizardcoder 15B V1.0

由 WizardLMTeam 开发

WizardCoder-33B-V1.1是基于deepseek-coder-33b-base训练的开源代码大语言模型，在HumanEval等代码生成基准测试中表现优异。

大型语言模型

Transformers

其他开源协议:Openrail #代码生成 #编程辅助 #大语言模型

下载量 521

发布时间 : 6/14/2023

模型简介

WizardCoder系列模型专注于代码生成和理解任务，通过Evol-Instruct方法增强模型能力，在多个代码生成基准测试中达到SOTA水平。

模型特点

高性能代码生成

在HumanEval、MBPP等代码生成基准测试中达到开源模型SOTA水平

Evol-Instruct方法

采用创新的Evol-Instruct训练方法增强模型代码理解和生成能力

多基准测试领先

在HumanEval、HumanEval-Plus、MBPP、MBPP-Plus等多个基准测试中表现优异

模型能力

代码生成

代码补全

代码理解

编程问题解答

使用案例

软件开发

自动化代码生成

根据自然语言描述生成可运行的代码

在HumanEval基准测试中达到79.9 pass@1

编程教育

帮助学生理解和学习编程概念

技术面试

编程题解答

解决技术面试中的编程问题

🚀 WizardCoder：通过Evol - Instruct赋能代码大语言模型

WizardCoder是专门针对编码任务开发的代码大语言模型。它借助Evol - Instruct方法，为代码相关指令定制提示，对Code LLM（如StarCoder）进行微调，在多个编码基准测试中表现出色，超越了众多开源和闭源模型。

🚀 快速开始

若你想快速体验WizardCoder的能力，可通过以下链接访问在线演示：

WizardCoder在线演示

若你想进行模型微调、推理或评估，请继续阅读后续章节。

✨ 主要特性

高性能表现

代码生成能力强：在[HumanEval基准测试](https://github.com/openai/human - eval)中，WizardCoder - 15B - v1.0模型达到了57.3 pass@1，比当前最优的开源代码大语言模型高出22.3分。
超越闭源模型：在某些基准测试中，WizardCoder排名第三，超越了Claude - Plus和Bard，且模型规模远小于这些闭源模型。

定制化训练

Evol - Instruct方法：针对编码任务对Evol - Instruct方法进行调整，定制代码相关指令提示，使用新创建的指令跟随训练集对模型进行微调。

多模型支持

除了代码生成模型WizardCoder，还发布了数学推理模型WizardMath，在数学相关基准测试中也有出色表现。

📦 安装指南

微调环境安装

若你想对WizardCoder进行微调，需按照以下步骤安装环境：

根据[Llama - X](https://github.com/AetherCortex/Llama - X)的说明，安装环境、下载训练代码并部署。注意：需安装deepspeed==0.9.2和transformers==4.29.2。
用我们仓库中的train_wizardcoder.py（src/train_wizardcoder.py）替换train.py。
登录Huggingface：

huggingface-cli login

推理环境安装

若你想进行推理，需安装jsonlines：

pip install jsonlines

评估环境安装

若你想对模型进行评估，需根据[HumanEval](https://github.com/openai/human - eval)的说明安装环境。

💻 使用示例

微调示例

以下是微调WizardCoder的训练命令：

deepspeed train_wizardcoder.py \
    --model_name_or_path "bigcode/starcoder" \
    --data_path "/your/path/to/code_instruction_data.json" \
    --output_dir "/your/path/to/ckpt" \
    --num_train_epochs 3 \
    --model_max_length 2048 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50 \
    --save_total_limit 2 \
    --learning_rate 2e-5 \
    --warmup_steps 30 \
    --logging_steps 2 \
    --lr_scheduler_type "cosine" \
    --report_to "tensorboard" \
    --gradient_checkpointing True \
    --deepspeed configs/deepspeed_config.json \
    --fp16 True

推理示例

以下是推理的命令：

python src\inference_wizardcoder.py \
    --base_model "/your/path/to/ckpt" \
    --input_data_path "/your/path/to/input/data.jsonl" \
    --output_data_path "/your/path/to/output/result.jsonl"

data.jsonl的格式如下：

{"idx": 11, "Instruction": "Write a Python code to count 1 to 10."}
{"idx": 12, "Instruction": "Write a Jave code to sum 1 to 10."}

评估示例

以下是在HumanEval上进行评估的步骤：

生成答案：

model="/path/to/your/model"
temp=0.2
max_len=2048
pred_num=200
num_seqs_per_iter=2

output_path=preds/T${temp}_N${pred_num}

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

# 164 problems, 21 per GPU if GPU=8
index=0
gpu_num=8
for ((i = 0; i < $gpu_num; i++)); do
  start_index=$((i * 21))
  end_index=$(((i + 1) * 21))

  gpu=$((i))
  echo 'Running process #' ${i} 'from' $start_index 'to' $end_index 'on GPU' ${gpu}
  ((index++))
  (
    CUDA_VISIBLE_DEVICES=$gpu python humaneval_gen.py --model ${model} \
      --start_index ${start_index} --end_index ${end_index} --temperature ${temp} \
      --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path}
  ) &
  if (($index % $gpu_num == 0)); then wait; fi
done

运行后处理代码src/process_humaneval.py收集所有答案文件中的代码补全：

output_path=preds/T${temp}_N${pred_num}

echo 'Output path: '$output_path
python process_humaneval.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt

evaluate_functional_correctness ${output_path}.jsonl

📚 详细文档

模型信息

属性	详情
模型类型	代码大语言模型
训练数据	78k进化后的代码指令

模型对比

WizardCoder与闭源模型对比

WizardCoder在基准测试中排名第三，超越了Claude - Plus（59.8 vs. 53.0）和Bard（59.8 vs. 44.5），且模型规模远小于这些模型。 WizardCoder与闭源模型对比

WizardCoder与开源模型对比

模型	HumanEval Pass@1	MBPP Pass@1
CodeGen - 16B - Multi	18.3	20.9
CodeGeeX	22.9	24.4
LLaMA - 33B	21.7	30.2
LLaMA - 65B	23.7	37.7
PaLM - 540B	26.2	36.8
PaLM - Coder - 540B	36.0	47.0
PaLM 2 - S	37.6	50.0
CodeGen - 16B - Mono	29.3	35.3
Code - Cushman - 001	33.5	45.9
StarCoder - 15B	33.6	43.6*
InstructCodeT5+	35.0	--
WizardLM - 30B 1.0	37.8	--
WizardCoder - 15B 1.0	57.3	51.8

注：StarCoder在MBPP上的结果为复现结果。上述表格对WizardCoder与其他模型在HumanEval和MBPP基准测试上进行了全面比较，为每个问题生成20个样本以估计pass@1分数，并使用相同的[代码](https://github.com/openai/human - eval/tree/master)进行评估。

新闻动态

[2024/01/04] 发布WizardCoder - 33B - V1.1，基于deepseek - coder - 33b - base训练，在EvalPlus排行榜上是最优的开源代码大语言模型，在HumanEval上达到79.9 pass@1，在HumanEval - Plus上达到73.2 pass@1，在MBPP上达到78.9 pass@1，在MBPP - Plus上达到66.9 pass@1。
[2024/01/04] WizardCoder - 33B - V1.1在HumanEval和HumanEval - Plus pass@1上超越了ChatGPT 3.5、Gemini Pro和DeepSeek - Coder - 33B - instruct。
[2024/01/04] WizardCoder - 33B - V1.1在MBPP和MBPP - Plus pass@1上与ChatGPT 3.5相当，超越了Gemini Pro。
[08/11/2023] 发布WizardMath模型。
WizardMath - 70B - V1.0模型在GSM8K基准测试上略优于一些闭源大语言模型，包括ChatGPT 3.5、Claude Instant 1和PaLM 2 540B。
WizardMath - 70B - V1.0模型在[GSM8k基准测试](https://github.com/openai/grade - school - math)上达到81.6 pass@1，比当前最优的开源大语言模型高24.8分。
WizardMath - 70B - V1.0模型在MATH基准测试上达到22.7 pass@1，比当前最优的开源大语言模型高9.2分。

模型指标

模型	检查点	论文	HumanEval	HumanEval+	MBPP	MBPP+	许可证
GPT - 4 - Turbo (Nov 2023)	-	-	85.4	81.7	83.0	70.7	-
GPT - 4 (May 2023)	-	-	88.4	76.8	-	-	-
GPT - 3.5 - Turbo (Nov 2023)	-	-	72.6	65.9	81.7	69.4	-
Gemini Pro	-	-	63.4	55.5	72.9	57.9	-
DeepSeek - Coder - 33B - instruct	-	-	78.7	72.6	78.7	66.7	-
WizardCoder - 33B - V1.1	🤗 [HF链接](https://huggingface.co/WizardLM/WizardCoder - 33B - V1.1)	📃 WizardCoder	79.9	73.2	78.9	66.9	[MSFTResearch](https://huggingface.co/WizardLM/WizardMath - 7B - V1.1/resolve/main/LICENSE)
WizardCoder - Python - 34B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardCoder - Python - 34B - V1.0)	📃 WizardCoder	73.2	64.6	73.2	59.9	[Llama2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardCoder - 15B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardCoder - 15B - V1.0)	📃 WizardCoder	59.8	52.4	--	--	[OpenRAIL - M](https://huggingface.co/spaces/bigcode/bigcode - model - license - agreement)
WizardCoder - Python - 13B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardCoder - Python - 13B - V1.0)	📃 WizardCoder	64.0	--	--	--	[Llama2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardCoder - Python - 7B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardCoder - Python - 7B - V1.0)	📃 WizardCoder	55.5	--	--	--	[Llama2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardCoder - 3B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardCoder - 3B - V1.0)	📃 WizardCoder	34.8	--	--	--	[OpenRAIL - M](https://huggingface.co/spaces/bigcode/bigcode - model - license - agreement)
WizardCoder - 1B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardCoder - 1B - V1.0)	📃 WizardCoder	23.8	--	--	--	[OpenRAIL - M](https://huggingface.co/spaces/bigcode/bigcode - model - license - agreement)

其他模型指标

模型	检查点	论文	GSM8k	MATH	在线演示	许可证
WizardMath - 70B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardMath - 70B - V1.0)	📃 WizardMath	81.6	22.7	演示	[Llama 2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardMath - 13B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardMath - 13B - V1.0)	📃 WizardMath	63.9	14.0	演示	[Llama 2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardMath - 7B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardMath - 7B - V1.0)	📃 WizardMath	54.9	10.7	演示	[Llama 2](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)

模型	检查点	论文	MT - Bench	AlpacaEval	WizardEval	HumanEval	许可证
WizardLM - 13B - V1.2	🤗 [HF链接](https://huggingface.co/WizardLM/WizardLM - 13B - V1.2)		7.06	89.17%	101.4%	36.6 pass@1	[Llama 2 License](https://ai.meta.com/resources/models - and - libraries/llama - downloads/)
WizardLM - 13B - V1.1	🤗 [HF链接](https://huggingface.co/WizardLM/WizardLM - 13B - V1.1)		6.76	86.32%	99.3%	25.0 pass@1	非商业用途
WizardLM - 30B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardLM - 30B - V1.0)		7.01		97.8%	37.8 pass@1	非商业用途
WizardLM - 13B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardLM - 13B - V1.0)		6.35	75.31%	89.1%	24.0 pass@1	非商业用途
WizardLM - 7B - V1.0	🤗 [HF链接](https://huggingface.co/WizardLM/WizardLM - 7B - V1.0)	📃 WizardLM			78.0%	19.1 pass@1	非商业用途

🔧 技术细节

为了开发WizardCoder模型，首先针对编码任务对Evol - Instruct方法进行调整，将提示定制为与代码相关的指令。然后，使用新创建的指令跟随训练集对代码大语言模型StarCoder进行微调。

📄 许可证

WizardCoder模型遵循与StarCoder相同的许可证。任何版本的WizardCoder生成的内容都会受到随机性等不可控变量的影响，因此本项目无法保证输出的准确性。本项目不承担模型输出内容的任何法律责任，也不对因使用相关资源和输出结果而造成的任何损失负责。

引用

若你使用了本仓库中的数据、方法或代码，请引用以下论文：

@article{luo2023wizardcoder,
  title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
  author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin},
  journal={arXiv preprint arXiv:2306.08568},
  year={2023}
}

反馈征集

欢迎大家使用专业且具有挑战性的指令对WizardCoder进行评估，并在问题讨论区向我们展示模型表现不佳的示例和建议。我们目前专注于改进Evol - Instruct方法，希望在WizardCoder的下一个版本中解决现有问题。之后，我们将开源最新的Evol - Instruct算法代码和流程，并与大家一起改进它。