Skywork-o1-Open-PRM-Qwen-2.5-1.5B開源模型 - 提升小規模複雜問題求解能力

首頁

Skywork O1 Open PRM Qwen 2.5 1.5B

由Skywork開發

Skywork o1 Open-PRM-Qwen-2.5-1.5B是基於Qwen2.5-Math-1.5B-Instruct訓練的增量過程獎勵模型，專為增強小規模複雜問題求解能力而設計。

大型語言模型

PyTorch

開源協議:其他 #數學推理增強 #過程獎勵模型 #小規模高效推理

下載量 4,368

發布時間 : 11/26/2024

模型概述

該模型是Skywork o1開放模型系列的一部分，專注於通過增量過程獎勵增強推理能力，特別適合數學和代碼相關的推理任務。

模型特點

增量過程獎勵

通過PRM機制增強推理能力，特別適合複雜問題求解

小規模高效

1.5B參數規模下實現優秀的推理性能

數學推理能力

在GSM8K、MATH等數學數據集上表現優異

模型能力

數學問題求解

代碼生成

複雜推理

多步問題解答

使用案例

教育

數學問題解答

幫助學生解決數學問題，提供詳細解題步驟

在GSM8K數據集上達到94.5%準確率

編程

代碼生成與優化

根據問題描述生成代碼解決方案

在HumanEval等代碼數據集上表現良好

🚀 Skywork o1 Open模型系列

Skywork o1 Open模型系列由崑崙萬維Skywork團隊開發，該系列模型創新性地融入了類似o1的慢思考與推理能力，為AI領域帶來了全新的解決方案，顯著提升了模型在推理任務中的表現。

🤗 Hugging Face • 🤖 ModelScope

🚀 快速開始

若你想了解PRM的詳細使用說明，請參考 Skywork-o1-PRM-Inference倉庫。

Hugging Face推理

克隆Skywork PRM推理倉庫

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

運行PRM推理

from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]


processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
_, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
step_rewards = derive_step_rewards(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

vllm服務器推理

安裝vllm並安裝vllm PRM插件

pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .

啟動vllm服務器

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
    --host 0.0.0.0 \
    --port 8081 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.9 \
    --enable-prefix-caching \
    --dtype auto

向服務器發起推理請求

from openai import OpenAI
from transformers import AutoTokenizer
from model_utils.io_utils import prepare_input, derive_step_rewards_vllm

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]

# 數據預處理
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8081/v1"
client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
rewards = client.embeddings.create(
    input=input_ids,
    model=model,
)

step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

✨ 主要特性

創新推理能力：融入類似o1的慢思考與推理能力，在輸出中展現出思考、規劃和反思能力，顯著提升推理技能。
多模型選擇：包含三款先進模型，分別基於不同基礎模型訓練，滿足不同規模和複雜度的推理任務需求。
性能卓越：在標準基準測試中，推理能力有顯著提升，推動基礎模型向推理任務的先進水平邁進。

📦 安裝指南

Hugging Face推理

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

vllm服務器推理

pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .

📚 詳細文檔

模型介紹

Skywork o1 Open模型系列由崑崙萬維Skywork團隊開發，包括三款先進模型：

Skywork o1 Open-Llama-3.1-8B：基於Llama-3.1-8B訓練的強大聊天模型，通過“o1風格”數據顯著增強推理能力。
Skywork o1 Open-PRM-Qwen-2.5-1.5B：專為通過增量過程獎勵提升推理能力而設計的專業模型，適用於小規模複雜問題解決。
Skywork o1 Open-PRM-Qwen-2.5-7B：在1.5B模型基礎上擴展，能夠處理更具挑戰性的推理任務。

模型信息

Skywork-o1-Open-PRM系列基於 Qwen2.5-Math-1.5B-Instruct 和 Qwen2.5-Math-7B-Instruct 訓練。

PRM評估

評估設置

數學評估：使用 Qwen2.5-Math 的評估腳本並遵循其配置，選擇的數據集包括 GSM8K、MATH、GaoKao、CN-Middle School 24、OlympiadBench、AMC-23 和 AIME-24。其中，GaoKao 和 CN-Middle School 24 為中文數據集，其餘為英文數據集。OlympiadBench、AIME-24 和 AMC-23 為競賽級數據集。
代碼評估：採用 Qwen2.5-Coder 的評估腳本並保持相同配置，選擇的數據集包括 HumanEval、MBPP 和 LiveCodeBench，其中 LiveCodeBench 採用 2024.01 - 2024 - 11 版本。由於先前版本在測試和代碼清理方面存在問題，使用 evalplus 的最新版本（0.3.1）。

評估基礎模型

評估RM在三款基礎模型上的性能：Qwen2.5-7B-Instruct、Llama3.1-8B-Instruct 和 Skywork-o1-Open-8B。進行數據採樣以驗證RM在不同模型上的性能，數學問題採樣溫度設置為 0.7，代碼相關任務採樣溫度設置為 1.0。

對比RM

Qwen2.5-Math-RM-72B：由Qwen團隊提供的開源ORM。
OpenR-MATH-psa-PRM-7B：來自OpenR項目的開源PRM。
RLHFlow-Deepseek-Data-PRM-8B：來自RLHFlow項目的開源PRM。

評估指標

Greedy Sampling Pass@1：使用貪心採樣生成答案。
Majority Voting@64：隨機採樣64個答案。
Best-of-N@64：根據獎勵模型（RM）提供的輸出值對64個答案進行排序。ORM和PRM的加權方法不同：
- ORM：僅使用最後一步的獎勵。
- PRM：使用所有步驟的平均獎勵進行加權。

評估結果

數學評估

基礎模型	獎勵模型	方法	GSM8K	MATH	GaoKao	CN-Middle School 24	OlympiadBench	AIME-24	AMC-23	平均
Skywork-o1-Open-8B	N/A	Greedy Sampling Pass@1	91.6	78.1	63.6	67.3	43.1	13.3	55.0	58.9
Skywork-o1-Open-8B	N/A	Majority Voting@64	93.9	84.3	69.5	73.3	50.4	16.7	52.5	62.9
Skywork-o1-Open-8B	OpenR-MATH-psa-PRM-7B	Best-of-N@64	95.1	82.7	67.1	70.3	47.6	20.0	57.5	62.9
Skywork-o1-Open-8B	RLHFlow-Deepseek-Data-PRM-8B	Best-of-N@64	94.4	80.1	59.1	74.3	43.0	20.0	50.0	60.1
Skywork-o1-Open-8B	Qwen2.5-Math-RM-72B	Best-of-N@64	96.1	86.9	76.0	76.2	53.3	26.7	65.0	68.6
Skywork-o1-Open-8B	Skywork-o1-Open-PRM-1.5B	Best-of-N@64	94.5	85.0	65.6	73.3	49.9	16.7	62.5	63.9
Skywork-o1-Open-8B	Skywork-o1-Open-PRM-7B	Best-of-N@64	96.7	87.0	70.3	76.2	52.3	23.3	65.0	67.3
Qwen2.5-7B-Instruct	N/A	Greedy Sampling Pass@1	91.9	75.2	55.6	75.2	39.1	13.3	45.0	56.5
Qwen2.5-7B-Instruct	N/A	Majority Voting@64	93.5	78.4	55.3	78.2	40.1	13.3	62.5	60.2
Qwen2.5-7B-Instruct	OpenR-MATH-psa-PRM-7B	Best-of-N@64	93.9	77.9	52.4	73.3	40.7	10.0	55.0	57.6
Qwen2.5-7B-Instruct	RLHFlow-Deepseek-Data-PRM-8B	Best-of-N@64	94.1	78.1	53.2	75.2	39.1	16.7	55.0	58.8
Qwen2.5-7B-Instruct	Qwen2.5-Math-RM-72B	Best-of-N@64	94.8	82.4	65.2	80.2	45.0	13.3	62.5	63.4
Qwen2.5-7B-Instruct	Skywork-o1-Open-PRM-1.5B	Best-of-N@64	93.3	79.8	56.1	74.3	43.9	10.0	67.5	60.7
Qwen2.5-7B-Instruct	Skywork-o1-Open-PRM-7B	Best-of-N@64	94.9	81.9	56.3	75.2	44.9	13.3	65.0	61.6
Llama3.1-8B-Instruct	N/A	Greedy Sampling Pass@1	85.3	49.7	25.3	47.5	16.6	6.7	27.5	36.9
Llama3.1-8B-Instruct	N/A	Majority Voting@64	90.9	62.9	28.0	56.4	26.4	13.3	37.5	45.1
Llama3.1-8B-Instruct	OpenR-MATH-psa-PRM-7B	Best-of-N@64	91.8	59.4	24.7	47.5	23.0	13.3	35.0	42.1
Llama3.1-8B-Instruct	RLHFlow-Deepseek-Data-PRM-8B	Best-of-N@64	89.8	56.1	24.0	40.6	20.4	0.0	35.0	38.0
Llama3.1-8B-Instruct	Qwen2.5-Math-RM-72B	Best-of-N@64	94.9	72.5	44.9	65.3	34.4	23.3	60.0	56.5
Llama3.1-8B-Instruct	Skywork-o1-Open-PRM-1.5B	Best-of-N@64	91.7	65.6	26.8	49.5	27.0	16.7	60.0	48.2
Llama3.1-8B-Instruct	Skywork-o1-Open-PRM-7B	Best-of-N@64	94.0	69.8	32.0	56.4	29.9	16.7	52.5	50.2

代碼評估

由於對比的PRM未在代碼相關任務上進行訓練，本節僅關注Skywork-o1-Open-PRM的性能。

基礎模型	獎勵模型	方法	MBPP	MBPP+	HumanEval	HumanEval+	LiveCodeBench-2024.01-2024-11
Skywork-o1-Open-8B	N/A	Greedy Sampling Pass@1	79.9	65.9	82.9	78.7	26.0
Skywork-o1-Open-8B	Skywork-o1-Open-PRM-7B	Best-of-N@64	81.2	68.5	81.1	74.4	31.3
Qwen2.5-7B-Instruct	N/A	Greedy Sampling Pass@1	81.7	69.3	84.8	78.0	25.3
Qwen2.5-7B-Instruct	Skywork-o1-Open-PRM-7B	Best-of-N@64	84.9	72.5	83.5	78.0	30.7
Llama3.1-8B-Instruct	N/A	Greedy Sampling Pass@1	75.7	64.0	68.3	63.4	16.1
Llama3.1-8B-Instruct	Skywork-o1-Open-PRM-7B	Best-of-N@64	77.8	66.1	73.8	67.7	23.3

🔧 技術細節

評估腳本：數學評估使用 Qwen2.5-Math 的評估腳本，代碼評估採用 Qwen2.5-Coder 的評估腳本。
數據採樣：在不同基礎模型上進行數據採樣，數學問題採樣溫度設置為 0.7，代碼相關任務採樣溫度設置為 1.0。
獎勵模型加權：ORM僅使用最後一步的獎勵，PRM使用所有步驟的平均獎勵進行加權。

📄 許可證

Skywork模型的社區使用需要遵循Skywork社區許可證。該模型支持商業使用，若計劃將Skywork模型或其衍生產品用於商業目的，必須遵守Skywork社區許可證中的條款和條件。

免責聲明

我們聲明，Skywork模型不得用於任何危害國家或社會安全的活動，或從事非法行為。此外，我們要求用戶在未進行適當的安全審查和記錄的情況下，不要將Skywork模型部署到互聯網服務中。我們希望所有用戶遵守這一原則，確保技術在規範和合法的環境中發展。

我們已盡力確保模型訓練過程中使用的數據合規，但由於模型和數據的複雜性，仍可能存在不可預測的風險和問題。因此，若因使用Skywork開源模型而出現任何問題，包括但不限於數據安全問題、輿論風險，或因模型被誤導、濫用、傳播或不當使用而產生的任何風險和問題，我們將不承擔任何責任。

引用

如果您認為我們的工作有幫助，請使用以下BibTeX條目引用我們：

@misc{skyworkopeno12024,
  title={Skywork-o1 Open Series},
  author={He, Jujie and Wei, Tianwen and Yan, Rui and Liu, Jiacai and Wang, Chaojie and Gan, Yimeng and Tu, Shiwen and Liu, Chris Yuhao and Zeng, Liang and Wang, Xiaokun and Wang, Boyang and Li, Yongcong and Zhang, Fuxiang and Xu, Jiacheng and An, Bo and Liu, Yang and Zhou, Yahui},
  year={2024},
  month={November},
  howpublished={\url{https://huggingface.co/Skywork}},
  url={https://huggingface.co/Skywork},
}