Skywork-o1-Open-PRM-Qwen-2.5-7B开源模型 - 支持渐进推理，免费部署实用之选

首页

Skywork O1 Open PRM Qwen 2.5 7B

由 Skywork 开发

Skywork o1开放模型系列中的7B参数规模模型，基于Qwen2.5-Math-7B-Instruct训练，具备渐进式过程奖励增强的推理能力

大型语言模型

PyTorch

开源协议:其他 #数学推理增强 #渐进式奖励模型 #竞赛级解题

下载量 3,608

发布时间 : 11/26/2024

模型简介

该模型是Skywork团队开发的o1开放模型系列成员，通过渐进式过程奖励机制增强推理能力，适合处理高要求的复杂问题求解任务。

模型特点

渐进式过程奖励机制

通过PRM(Process Reward Model)技术增强推理过程的可靠性

数学推理能力增强

在GSM8K、MATH等数学数据集上表现优异

竞赛级问题求解

能够处理AMC、AIME等数学竞赛级别的问题

模型能力

数学问题求解

代码生成

多轮对话

复杂推理

竞赛级数学问题解答

使用案例

教育

数学辅导

帮助学生解决从基础到竞赛级别的数学问题

在GSM8K和MATH数据集上表现优异

编程辅助

代码生成与优化

根据自然语言描述生成代码

在HumanEval和MBPP数据集上有良好表现

🚀 Skywork o1 Open模型系列

Skywork o1 Open模型系列由昆仑万维的Skywork团队开发，创新性地融入了类似o1的慢思考与推理能力，为AI推理领域带来了新的突破。该系列包含三款先进模型，在推理能力上表现卓越，推动了AI技术的发展。

🤗 Hugging Face • 🤖 ModelScope

🚀 快速开始

若需了解使用PRM的详细说明，请参考 Skywork-o1-PRM-Inference仓库。

huggingface推理

克隆Skywork PRM推理仓库

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

运行PRM推理

from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]


processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
_, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
step_rewards = derive_step_rewards(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

vllm服务器推理

安装vllm和vllm PRM插件

pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .

启动vllm服务器

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
    --host 0.0.0.0 \
    --port 8081 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.9 \
    --enable-prefix-caching \
    --dtype auto

请求服务器进行推理

from openai import OpenAI
from transformers import AutoTokenizer
from model_utils.io_utils import prepare_input, derive_step_rewards_vllm

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]

# 数据预处理
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8081/v1"
client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
rewards = client.embeddings.create(
    input=input_ids,
    model=model,
)

step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

✨ 主要特性

融入慢思考与推理能力：Skywork o1 Open模型系列创新性地融入了类似o1的慢思考和推理能力，在输出中展现出先天的思考、规划和反思能力。
多模型选择：包含三款先进模型，分别为 Skywork o1 Open-Llama-3.1-8B、Skywork o1 Open-PRM-Qwen-2.5-1.5B 和 Skywork o1 Open-PRM-Qwen-2.5-7B，可满足不同场景需求。
推理能力显著提升：在标准基准测试中，推理能力有显著提升，推动了AI推理任务的发展。

📦 安装指南

huggingface推理

克隆Skywork PRM推理仓库

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

运行PRM推理

from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]


processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
_, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
step_rewards = derive_step_rewards(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

vllm服务器推理

安装vllm和vllm PRM插件

pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .

启动vllm服务器

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
    --host 0.0.0.0 \
    --port 8081 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.9 \
    --enable-prefix-caching \
    --dtype auto

请求服务器进行推理

from openai import OpenAI
from transformers import AutoTokenizer
from model_utils.io_utils import prepare_input, derive_step_rewards_vllm

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]

# 数据预处理
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8081/v1"
client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
rewards = client.embeddings.create(
    input=input_ids,
    model=model,
)

step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

📚 详细文档

模型信息

Skywork-o1-Open-PRM系列基于 Qwen2.5-Math-1.5B-Instruct 和 Qwen2.5-Math-7B-Instruct 进行训练。

PRM评估

评估设置

数学评估

我们使用了 Qwen2.5-Math 的评估脚本，并遵循其配置以确保一致性。选择的数据集包括 GSM8K、MATH、GaoKao、CN-Middle School 24、OlympiadBench、AMC-23 和 AIME-24。其中，GaoKao 和 CN-Middle School 24 是中文数据集，其余为英文数据集。值得注意的是，OlympiadBench、AIME-24 和 AMC-23 是竞赛级数据集。

代码评估

对于代码评估，我们采用了 Qwen2.5-Coder 的评估脚本，并基本保持相同的配置。选择的数据集包括 HumanEval、MBPP 和 LiveCodeBench，其中 LiveCodeBench 特别使用了版本 2024.01 - 2024 - 11。由于先前版本在测试和代码清理方面存在问题，我们使用了 evalplus 的最新版本（0.3.1）。

评估基础模型

我们在三个基础模型上评估了RM的性能：Qwen2.5-7B-Instruct、Llama3.1-8B-Instruct 和 Skywork-o1-Open-8B。进行了数据采样以验证RM在不同模型上的性能。数学问题的采样温度设置为 0.7，代码相关任务的采样温度设置为 1.0。

对比的RM

Qwen2.5-Math-RM-72B：由Qwen团队提供的开源ORM。
OpenR-MATH-psa-PRM-7B：来自OpenR项目的开源PRM。
RLHFlow-Deepseek-Data-PRM-8B：来自RLHFlow项目的开源PRM。

评估指标

Greedy Sampling Pass@1：使用贪心采样生成答案。
Majority Voting@64：随机采样64个答案。
Best-of-N@64：根据奖励模型（RM）提供的输出值对64个答案进行排序。ORM和PRM的加权方法不同：
- 对于 ORM，仅使用最后一步的奖励。
- 对于 PRM，使用所有步骤的平均奖励进行加权。

评估结果

数学评估

奖励模型	方法	GSM8K	MATH	GaoKao	CN-Middle School 24	OlympiadBench	AIME-24	AMC-23	平均
N/A	Greedy Sampling Pass@1	91.6	78.1	63.6	67.3	43.1	13.3	55.0	58.9
N/A	Majority Voting@64	93.9	84.3	69.5	73.3	50.4	16.7	52.5	62.9
OpenR-MATH-psa-PRM-7B	Best-of-N@64	95.1	82.7	67.1	70.3	47.6	20.0	57.5	62.9
RLHFlow-Deepseek-Data-PRM-8B	Best-of-N@64	94.4	80.1	59.1	74.3	43.0	20.0	50.0	60.1
Qwen2.5-Math-RM-72B	Best-of-N@64	96.1	86.9	76.0	76.2	53.3	26.7	65.0	68.6
Skywork-o1-Open-PRM-1.5B	Best-of-N@64	94.5	85.0	65.6	73.3	49.9	16.7	62.5	63.9
Skywork-o1-Open-PRM-7B	Best-of-N@64	96.7	87.0	70.3	76.2	52.3	23.3	65.0	67.3

代码评估

由于对比的PRM未在代码相关任务上进行训练，本节仅关注Skywork-o1-Open-PRM的性能。

奖励模型	方法	MBPP	MBPP+	HumanEval	HumanEval+	LiveCodeBench-2024.01 - 2024 - 11
N/A	Greedy Sampling Pass@1	79.9	65.9	82.9	78.7	26.0
Skywork-o1-Open-PRM-7B	Best-of-N@64	81.2	68.5	81.1	74.4	31.3

🔧 技术细节

待办事项

为逐步RM评估方法添加更多结果。
调整RM架构，以增强与vLLM/sglang推理的兼容性。
通过纳入更多类型的推理任务来扩展RM的用例。
缓解不同推理任务之间的性能冲突。

📄 许可证

Skywork模型的社区使用需要遵循Skywork社区许可证。Skywork模型支持商业使用。如果您计划将Skywork模型或其衍生产品用于商业目的，则必须遵守Skywork社区许可证中的条款和条件。

免责声明

我们在此声明，Skywork模型不得用于任何对国家或社会安全构成威胁的活动或从事非法行为。此外，我们要求用户在未进行适当的安全审查和记录的情况下，不要将Skywork模型部署用于互联网服务。我们希望所有用户都能遵守这一原则，以确保技术进步在规范和合法的环境中进行。

我们已尽最大努力确保模型训练过程中使用的数据的合规性。然而，尽管我们付出了巨大努力，但由于模型和数据的复杂性，仍可能存在不可预测的风险和问题。因此，如果因使用Skywork开源模型而出现任何问题，包括但不限于数据安全问题、舆论风险，或因模型被误导、滥用、传播或不当使用而产生的任何风险和问题，我们将不承担任何责任。

引用

如果您认为我们的工作有帮助，请使用以下BibTeX条目引用我们：

@misc{skyworkopeno12024,
  title={Skywork-o1 Open Series},
  author={He, Jujie and Wei, Tianwen and Yan, Rui and Liu, Jiacai and Wang, Chaojie and Gan, Yimeng and Tu, Shiwen and Liu, Chris Yuhao and Zeng, Liang and Wang, Xiaokun and Wang, Boyang and Li, Yongcong and Zhang, Fuxiang and Xu, Jiacheng and An, Bo and Liu, Yang and Zhou, Yahui},
  year={2024},
  month={November},
  howpublished={\url{https://huggingface.co/Skywork}},
  url={https://huggingface.co/Skywork},
}