模型简介
模型特点
模型能力
使用案例
🚀 Skywork o1 Open模型系列
Skywork o1 Open模型系列由昆仑万维Skywork团队开发,该系列模型创新性地融入了类似o1的慢思考与推理能力,为AI领域带来了全新的解决方案,显著提升了模型在推理任务中的表现。
🚀 快速开始
若你想了解PRM的详细使用说明,请参考 Skywork-o1-PRM-Inference仓库。
Hugging Face推理
- 克隆Skywork PRM推理仓库
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
- 运行PRM推理
from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards
prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n - She eats 3 eggs for breakfast every morning.\n - She bakes muffins for her friends every day with 4 eggs.\n So, the total number of eggs used per day is:\n \\[\n 3 + 4 = 7 \\text{ eggs}\n \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n Subtract the number of eggs used from the total number of eggs laid:\n \\[\n 16 - 7 = 9 \\text{ eggs}\n \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n She sells each egg for $2, so the total amount of money she makes is:\n \\[\n 9 \\times 2 = 18 \\text{ dollars}\n \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
},
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n - Janet eats 3 eggs for breakfast.\n - She bakes 4 eggs for muffins.\n - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n - Total eggs laid: 16\n - Eggs used: 7\n - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n - Price per egg: $2\n - Number of eggs sold: 9\n - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
}
]
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)
model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
_, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
step_rewards = derive_step_rewards(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])
vllm服务器推理
- 安装vllm并安装vllm PRM插件
pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .
- 启动vllm服务器
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
--host 0.0.0.0 \
--port 8081 \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.9 \
--enable-prefix-caching \
--dtype auto
- 向服务器发起推理请求
from openai import OpenAI
from transformers import AutoTokenizer
from model_utils.io_utils import prepare_input, derive_step_rewards_vllm
prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n - She eats 3 eggs for breakfast every morning.\n - She bakes muffins for her friends every day with 4 eggs.\n So, the total number of eggs used per day is:\n \\[\n 3 + 4 = 7 \\text{ eggs}\n \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n Subtract the number of eggs used from the total number of eggs laid:\n \\[\n 16 - 7 = 9 \\text{ eggs}\n \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n She sells each egg for $2, so the total amount of money she makes is:\n \\[\n 9 \\times 2 = 18 \\text{ dollars}\n \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
},
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n - Janet eats 3 eggs for breakfast.\n - She bakes 4 eggs for muffins.\n - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n - Total eggs laid: 16\n - Eggs used: 7\n - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n - Price per egg: $2\n - Number of eggs sold: 9\n - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
}
]
# 数据预处理
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8081/v1"
client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
rewards = client.embeddings.create(
input=input_ids,
model=model,
)
step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])
✨ 主要特性
- 创新推理能力:融入类似o1的慢思考与推理能力,在输出中展现出思考、规划和反思能力,显著提升推理技能。
- 多模型选择:包含三款先进模型,分别基于不同基础模型训练,满足不同规模和复杂度的推理任务需求。
- 性能卓越:在标准基准测试中,推理能力有显著提升,推动基础模型向推理任务的先进水平迈进。
📦 安装指南
Hugging Face推理
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
vllm服务器推理
pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .
📚 详细文档
模型介绍
Skywork o1 Open模型系列由昆仑万维Skywork团队开发,包括三款先进模型:
- Skywork o1 Open-Llama-3.1-8B:基于Llama-3.1-8B训练的强大聊天模型,通过“o1风格”数据显著增强推理能力。
- Skywork o1 Open-PRM-Qwen-2.5-1.5B:专为通过增量过程奖励提升推理能力而设计的专业模型,适用于小规模复杂问题解决。
- Skywork o1 Open-PRM-Qwen-2.5-7B:在1.5B模型基础上扩展,能够处理更具挑战性的推理任务。
模型信息
Skywork-o1-Open-PRM系列基于 Qwen2.5-Math-1.5B-Instruct 和 Qwen2.5-Math-7B-Instruct 训练。
PRM评估
评估设置
- 数学评估:使用 Qwen2.5-Math 的评估脚本并遵循其配置,选择的数据集包括 GSM8K、MATH、GaoKao、CN-Middle School 24、OlympiadBench、AMC-23 和 AIME-24。其中,GaoKao 和 CN-Middle School 24 为中文数据集,其余为英文数据集。OlympiadBench、AIME-24 和 AMC-23 为竞赛级数据集。
- 代码评估:采用 Qwen2.5-Coder 的评估脚本并保持相同配置,选择的数据集包括 HumanEval、MBPP 和 LiveCodeBench,其中 LiveCodeBench 采用 2024.01 - 2024 - 11 版本。由于先前版本在测试和代码清理方面存在问题,使用 evalplus 的最新版本(0.3.1)。
评估基础模型
评估RM在三款基础模型上的性能:Qwen2.5-7B-Instruct、Llama3.1-8B-Instruct 和 Skywork-o1-Open-8B。进行数据采样以验证RM在不同模型上的性能,数学问题采样温度设置为 0.7,代码相关任务采样温度设置为 1.0。
对比RM
- Qwen2.5-Math-RM-72B:由Qwen团队提供的开源ORM。
- OpenR-MATH-psa-PRM-7B:来自OpenR项目的开源PRM。
- RLHFlow-Deepseek-Data-PRM-8B:来自RLHFlow项目的开源PRM。
评估指标
- Greedy Sampling Pass@1:使用贪心采样生成答案。
- Majority Voting@64:随机采样64个答案。
- Best-of-N@64:根据奖励模型(RM)提供的输出值对64个答案进行排序。ORM和PRM的加权方法不同:
- ORM:仅使用最后一步的奖励。
- PRM:使用所有步骤的平均奖励进行加权。
评估结果
数学评估
基础模型 | 奖励模型 | 方法 | GSM8K | MATH | GaoKao | CN-Middle School 24 | OlympiadBench | AIME-24 | AMC-23 | 平均 |
---|---|---|---|---|---|---|---|---|---|---|
Skywork-o1-Open-8B | N/A | Greedy Sampling Pass@1 | 91.6 | 78.1 | 63.6 | 67.3 | 43.1 | 13.3 | 55.0 | 58.9 |
Skywork-o1-Open-8B | N/A | Majority Voting@64 | 93.9 | 84.3 | 69.5 | 73.3 | 50.4 | 16.7 | 52.5 | 62.9 |
Skywork-o1-Open-8B | OpenR-MATH-psa-PRM-7B | Best-of-N@64 | 95.1 | 82.7 | 67.1 | 70.3 | 47.6 | 20.0 | 57.5 | 62.9 |
Skywork-o1-Open-8B | RLHFlow-Deepseek-Data-PRM-8B | Best-of-N@64 | 94.4 | 80.1 | 59.1 | 74.3 | 43.0 | 20.0 | 50.0 | 60.1 |
Skywork-o1-Open-8B | Qwen2.5-Math-RM-72B | Best-of-N@64 | 96.1 | 86.9 | 76.0 | 76.2 | 53.3 | 26.7 | 65.0 | 68.6 |
Skywork-o1-Open-8B | Skywork-o1-Open-PRM-1.5B | Best-of-N@64 | 94.5 | 85.0 | 65.6 | 73.3 | 49.9 | 16.7 | 62.5 | 63.9 |
Skywork-o1-Open-8B | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 96.7 | 87.0 | 70.3 | 76.2 | 52.3 | 23.3 | 65.0 | 67.3 |
Qwen2.5-7B-Instruct | N/A | Greedy Sampling Pass@1 | 91.9 | 75.2 | 55.6 | 75.2 | 39.1 | 13.3 | 45.0 | 56.5 |
Qwen2.5-7B-Instruct | N/A | Majority Voting@64 | 93.5 | 78.4 | 55.3 | 78.2 | 40.1 | 13.3 | 62.5 | 60.2 |
Qwen2.5-7B-Instruct | OpenR-MATH-psa-PRM-7B | Best-of-N@64 | 93.9 | 77.9 | 52.4 | 73.3 | 40.7 | 10.0 | 55.0 | 57.6 |
Qwen2.5-7B-Instruct | RLHFlow-Deepseek-Data-PRM-8B | Best-of-N@64 | 94.1 | 78.1 | 53.2 | 75.2 | 39.1 | 16.7 | 55.0 | 58.8 |
Qwen2.5-7B-Instruct | Qwen2.5-Math-RM-72B | Best-of-N@64 | 94.8 | 82.4 | 65.2 | 80.2 | 45.0 | 13.3 | 62.5 | 63.4 |
Qwen2.5-7B-Instruct | Skywork-o1-Open-PRM-1.5B | Best-of-N@64 | 93.3 | 79.8 | 56.1 | 74.3 | 43.9 | 10.0 | 67.5 | 60.7 |
Qwen2.5-7B-Instruct | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 94.9 | 81.9 | 56.3 | 75.2 | 44.9 | 13.3 | 65.0 | 61.6 |
Llama3.1-8B-Instruct | N/A | Greedy Sampling Pass@1 | 85.3 | 49.7 | 25.3 | 47.5 | 16.6 | 6.7 | 27.5 | 36.9 |
Llama3.1-8B-Instruct | N/A | Majority Voting@64 | 90.9 | 62.9 | 28.0 | 56.4 | 26.4 | 13.3 | 37.5 | 45.1 |
Llama3.1-8B-Instruct | OpenR-MATH-psa-PRM-7B | Best-of-N@64 | 91.8 | 59.4 | 24.7 | 47.5 | 23.0 | 13.3 | 35.0 | 42.1 |
Llama3.1-8B-Instruct | RLHFlow-Deepseek-Data-PRM-8B | Best-of-N@64 | 89.8 | 56.1 | 24.0 | 40.6 | 20.4 | 0.0 | 35.0 | 38.0 |
Llama3.1-8B-Instruct | Qwen2.5-Math-RM-72B | Best-of-N@64 | 94.9 | 72.5 | 44.9 | 65.3 | 34.4 | 23.3 | 60.0 | 56.5 |
Llama3.1-8B-Instruct | Skywork-o1-Open-PRM-1.5B | Best-of-N@64 | 91.7 | 65.6 | 26.8 | 49.5 | 27.0 | 16.7 | 60.0 | 48.2 |
Llama3.1-8B-Instruct | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 94.0 | 69.8 | 32.0 | 56.4 | 29.9 | 16.7 | 52.5 | 50.2 |
代码评估
由于对比的PRM未在代码相关任务上进行训练,本节仅关注Skywork-o1-Open-PRM的性能。
基础模型 | 奖励模型 | 方法 | MBPP | MBPP+ | HumanEval | HumanEval+ | LiveCodeBench-2024.01-2024-11 |
---|---|---|---|---|---|---|---|
Skywork-o1-Open-8B | N/A | Greedy Sampling Pass@1 | 79.9 | 65.9 | 82.9 | 78.7 | 26.0 |
Skywork-o1-Open-8B | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 81.2 | 68.5 | 81.1 | 74.4 | 31.3 |
Qwen2.5-7B-Instruct | N/A | Greedy Sampling Pass@1 | 81.7 | 69.3 | 84.8 | 78.0 | 25.3 |
Qwen2.5-7B-Instruct | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 84.9 | 72.5 | 83.5 | 78.0 | 30.7 |
Llama3.1-8B-Instruct | N/A | Greedy Sampling Pass@1 | 75.7 | 64.0 | 68.3 | 63.4 | 16.1 |
Llama3.1-8B-Instruct | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 77.8 | 66.1 | 73.8 | 67.7 | 23.3 |
🔧 技术细节
- 评估脚本:数学评估使用 Qwen2.5-Math 的评估脚本,代码评估采用 Qwen2.5-Coder 的评估脚本。
- 数据采样:在不同基础模型上进行数据采样,数学问题采样温度设置为 0.7,代码相关任务采样温度设置为 1.0。
- 奖励模型加权:ORM仅使用最后一步的奖励,PRM使用所有步骤的平均奖励进行加权。
📄 许可证
Skywork模型的社区使用需要遵循Skywork社区许可证。该模型支持商业使用,若计划将Skywork模型或其衍生产品用于商业目的,必须遵守Skywork社区许可证中的条款和条件。
免责声明
我们声明,Skywork模型不得用于任何危害国家或社会安全的活动,或从事非法行为。此外,我们要求用户在未进行适当的安全审查和记录的情况下,不要将Skywork模型部署到互联网服务中。我们希望所有用户遵守这一原则,确保技术在规范和合法的环境中发展。
我们已尽力确保模型训练过程中使用的数据合规,但由于模型和数据的复杂性,仍可能存在不可预测的风险和问题。因此,若因使用Skywork开源模型而出现任何问题,包括但不限于数据安全问题、舆论风险,或因模型被误导、滥用、传播或不当使用而产生的任何风险和问题,我们将不承担任何责任。
引用
如果您认为我们的工作有帮助,请使用以下BibTeX条目引用我们:
@misc{skyworkopeno12024,
title={Skywork-o1 Open Series},
author={He, Jujie and Wei, Tianwen and Yan, Rui and Liu, Jiacai and Wang, Chaojie and Gan, Yimeng and Tu, Shiwen and Liu, Chris Yuhao and Zeng, Liang and Wang, Xiaokun and Wang, Boyang and Li, Yongcong and Zhang, Fuxiang and Xu, Jiacheng and An, Bo and Liu, Yang and Zhou, Yahui},
year={2024},
month={November},
howpublished={\url{https://huggingface.co/Skywork}},
url={https://huggingface.co/Skywork},
}
TODO
- 增加更多逐步RM评估方法的结果。
- 调整RM架构,增强与vLLM/sglang推理的兼容性。
- 通过纳入更多类型的推理任务,扩展RM的使用场景。
- 缓解不同推理任务之间的性能冲突。
联系我们
如果您有任何问题,请随时通过 {jujie.he, jiacai.liu}@kunlun-inc.com 与我们联系。



