Skywork-o1-Open-PRM-Qwen-2.5-7B開源模型 - 支持漸進推理，免費部署實用之選

首頁

Skywork O1 Open PRM Qwen 2.5 7B

由Skywork開發

Skywork o1開放模型系列中的7B參數規模模型，基於Qwen2.5-Math-7B-Instruct訓練，具備漸進式過程獎勵增強的推理能力

大型語言模型

PyTorch

開源協議:其他 #數學推理增強 #漸進式獎勵模型 #競賽級解題

下載量 3,608

發布時間 : 11/26/2024

模型概述

該模型是Skywork團隊開發的o1開放模型系列成員，通過漸進式過程獎勵機制增強推理能力，適合處理高要求的複雜問題求解任務。

模型特點

漸進式過程獎勵機制

通過PRM(Process Reward Model)技術增強推理過程的可靠性

數學推理能力增強

在GSM8K、MATH等數學數據集上表現優異

競賽級問題求解

能夠處理AMC、AIME等數學競賽級別的問題

模型能力

數學問題求解

代碼生成

多輪對話

複雜推理

競賽級數學問題解答

使用案例

教育

數學輔導

幫助學生解決從基礎到競賽級別的數學問題

在GSM8K和MATH數據集上表現優異

編程輔助

代碼生成與優化

根據自然語言描述生成代碼

在HumanEval和MBPP數據集上有良好表現

🚀 Skywork o1 Open模型系列

Skywork o1 Open模型系列由崑崙萬維的Skywork團隊開發，創新性地融入了類似o1的慢思考與推理能力，為AI推理領域帶來了新的突破。該系列包含三款先進模型，在推理能力上表現卓越，推動了AI技術的發展。

🤗 Hugging Face • 🤖 ModelScope

🚀 快速開始

若需瞭解使用PRM的詳細說明，請參考 Skywork-o1-PRM-Inference倉庫。

huggingface推理

克隆Skywork PRM推理倉庫

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

運行PRM推理

from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]


processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
_, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
step_rewards = derive_step_rewards(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

vllm服務器推理

安裝vllm和vllm PRM插件

pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .

啟動vllm服務器

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
    --host 0.0.0.0 \
    --port 8081 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.9 \
    --enable-prefix-caching \
    --dtype auto

請求服務器進行推理

from openai import OpenAI
from transformers import AutoTokenizer
from model_utils.io_utils import prepare_input, derive_step_rewards_vllm

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]

# 數據預處理
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8081/v1"
client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
rewards = client.embeddings.create(
    input=input_ids,
    model=model,
)

step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

✨ 主要特性

融入慢思考與推理能力：Skywork o1 Open模型系列創新性地融入了類似o1的慢思考和推理能力，在輸出中展現出先天的思考、規劃和反思能力。
多模型選擇：包含三款先進模型，分別為 Skywork o1 Open-Llama-3.1-8B、Skywork o1 Open-PRM-Qwen-2.5-1.5B 和 Skywork o1 Open-PRM-Qwen-2.5-7B，可滿足不同場景需求。
推理能力顯著提升：在標準基準測試中，推理能力有顯著提升，推動了AI推理任務的發展。

📦 安裝指南

huggingface推理

克隆Skywork PRM推理倉庫

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

運行PRM推理

from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]


processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
_, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
step_rewards = derive_step_rewards(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

vllm服務器推理

安裝vllm和vllm PRM插件

pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .

啟動vllm服務器

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
    --host 0.0.0.0 \
    --port 8081 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.9 \
    --enable-prefix-caching \
    --dtype auto

請求服務器進行推理

from openai import OpenAI
from transformers import AutoTokenizer
from model_utils.io_utils import prepare_input, derive_step_rewards_vllm

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n   - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n   - Janet eats 3 eggs for breakfast.\n   - She bakes 4 eggs for muffins.\n   - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n   - Total eggs laid: 16\n   - Eggs used: 7\n   - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n   - Price per egg: $2\n   - Number of eggs sold: 9\n   - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
    }
]

# 數據預處理
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8081/v1"
client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
rewards = client.embeddings.create(
    input=input_ids,
    model=model,
)

step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])

📚 詳細文檔

模型信息

Skywork-o1-Open-PRM系列基於 Qwen2.5-Math-1.5B-Instruct 和 Qwen2.5-Math-7B-Instruct 進行訓練。

PRM評估

評估設置

數學評估

我們使用了 Qwen2.5-Math 的評估腳本，並遵循其配置以確保一致性。選擇的數據集包括 GSM8K、MATH、GaoKao、CN-Middle School 24、OlympiadBench、AMC-23 和 AIME-24。其中，GaoKao 和 CN-Middle School 24 是中文數據集，其餘為英文數據集。值得注意的是，OlympiadBench、AIME-24 和 AMC-23 是競賽級數據集。

代碼評估

對於代碼評估，我們採用了 Qwen2.5-Coder 的評估腳本，並基本保持相同的配置。選擇的數據集包括 HumanEval、MBPP 和 LiveCodeBench，其中 LiveCodeBench 特別使用了版本 2024.01 - 2024 - 11。由於先前版本在測試和代碼清理方面存在問題，我們使用了 evalplus 的最新版本（0.3.1）。

評估基礎模型

我們在三個基礎模型上評估了RM的性能：Qwen2.5-7B-Instruct、Llama3.1-8B-Instruct 和 Skywork-o1-Open-8B。進行了數據採樣以驗證RM在不同模型上的性能。數學問題的採樣溫度設置為 0.7，代碼相關任務的採樣溫度設置為 1.0。

對比的RM

Qwen2.5-Math-RM-72B：由Qwen團隊提供的開源ORM。
OpenR-MATH-psa-PRM-7B：來自OpenR項目的開源PRM。
RLHFlow-Deepseek-Data-PRM-8B：來自RLHFlow項目的開源PRM。

評估指標

Greedy Sampling Pass@1：使用貪心採樣生成答案。
Majority Voting@64：隨機採樣64個答案。
Best-of-N@64：根據獎勵模型（RM）提供的輸出值對64個答案進行排序。ORM和PRM的加權方法不同：
- 對於 ORM，僅使用最後一步的獎勵。
- 對於 PRM，使用所有步驟的平均獎勵進行加權。

評估結果

數學評估

獎勵模型	方法	GSM8K	MATH	GaoKao	CN-Middle School 24	OlympiadBench	AIME-24	AMC-23	平均
N/A	Greedy Sampling Pass@1	91.6	78.1	63.6	67.3	43.1	13.3	55.0	58.9
N/A	Majority Voting@64	93.9	84.3	69.5	73.3	50.4	16.7	52.5	62.9
OpenR-MATH-psa-PRM-7B	Best-of-N@64	95.1	82.7	67.1	70.3	47.6	20.0	57.5	62.9
RLHFlow-Deepseek-Data-PRM-8B	Best-of-N@64	94.4	80.1	59.1	74.3	43.0	20.0	50.0	60.1
Qwen2.5-Math-RM-72B	Best-of-N@64	96.1	86.9	76.0	76.2	53.3	26.7	65.0	68.6
Skywork-o1-Open-PRM-1.5B	Best-of-N@64	94.5	85.0	65.6	73.3	49.9	16.7	62.5	63.9
Skywork-o1-Open-PRM-7B	Best-of-N@64	96.7	87.0	70.3	76.2	52.3	23.3	65.0	67.3

代碼評估

由於對比的PRM未在代碼相關任務上進行訓練，本節僅關注Skywork-o1-Open-PRM的性能。

獎勵模型	方法	MBPP	MBPP+	HumanEval	HumanEval+	LiveCodeBench-2024.01 - 2024 - 11
N/A	Greedy Sampling Pass@1	79.9	65.9	82.9	78.7	26.0
Skywork-o1-Open-PRM-7B	Best-of-N@64	81.2	68.5	81.1	74.4	31.3

🔧 技術細節

待辦事項

為逐步RM評估方法添加更多結果。
調整RM架構，以增強與vLLM/sglang推理的兼容性。
通過納入更多類型的推理任務來擴展RM的用例。
緩解不同推理任務之間的性能衝突。

📄 許可證

Skywork模型的社區使用需要遵循Skywork社區許可證。Skywork模型支持商業使用。如果您計劃將Skywork模型或其衍生產品用於商業目的，則必須遵守Skywork社區許可證中的條款和條件。

免責聲明

我們在此聲明，Skywork模型不得用於任何對國家或社會安全構成威脅的活動或從事非法行為。此外，我們要求用戶在未進行適當的安全審查和記錄的情況下，不要將Skywork模型部署用於互聯網服務。我們希望所有用戶都能遵守這一原則，以確保技術進步在規範和合法的環境中進行。

我們已盡最大努力確保模型訓練過程中使用的數據的合規性。然而，儘管我們付出了巨大努力，但由於模型和數據的複雜性，仍可能存在不可預測的風險和問題。因此，如果因使用Skywork開源模型而出現任何問題，包括但不限於數據安全問題、輿論風險，或因模型被誤導、濫用、傳播或不當使用而產生的任何風險和問題，我們將不承擔任何責任。

引用

如果您認為我們的工作有幫助，請使用以下BibTeX條目引用我們：

@misc{skyworkopeno12024,
  title={Skywork-o1 Open Series},
  author={He, Jujie and Wei, Tianwen and Yan, Rui and Liu, Jiacai and Wang, Chaojie and Gan, Yimeng and Tu, Shiwen and Liu, Chris Yuhao and Zeng, Liang and Wang, Xiaokun and Wang, Boyang and Li, Yongcong and Zhang, Fuxiang and Xu, Jiacheng and An, Bo and Liu, Yang and Zhou, Yahui},
  year={2024},
  month={November},
  howpublished={\url{https://huggingface.co/Skywork}},
  url={https://huggingface.co/Skywork},
}