模型概述
模型特點
模型能力
使用案例
🚀 Skywork o1 Open模型系列
Skywork o1 Open模型系列由崑崙萬維Skywork團隊開發,該系列模型創新性地融入了類似o1的慢思考與推理能力,為AI領域帶來了全新的解決方案,顯著提升了模型在推理任務中的表現。
🚀 快速開始
若你想了解PRM的詳細使用說明,請參考 Skywork-o1-PRM-Inference倉庫。
Hugging Face推理
- 克隆Skywork PRM推理倉庫
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
- 運行PRM推理
from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards
prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n - She eats 3 eggs for breakfast every morning.\n - She bakes muffins for her friends every day with 4 eggs.\n So, the total number of eggs used per day is:\n \\[\n 3 + 4 = 7 \\text{ eggs}\n \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n Subtract the number of eggs used from the total number of eggs laid:\n \\[\n 16 - 7 = 9 \\text{ eggs}\n \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n She sells each egg for $2, so the total amount of money she makes is:\n \\[\n 9 \\times 2 = 18 \\text{ dollars}\n \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
},
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n - Janet eats 3 eggs for breakfast.\n - She bakes 4 eggs for muffins.\n - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n - Total eggs laid: 16\n - Eggs used: 7\n - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n - Price per egg: $2\n - Number of eggs sold: 9\n - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
}
]
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)
model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
_, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
step_rewards = derive_step_rewards(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])
vllm服務器推理
- 安裝vllm並安裝vllm PRM插件
pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .
- 啟動vllm服務器
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
--host 0.0.0.0 \
--port 8081 \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.9 \
--enable-prefix-caching \
--dtype auto
- 向服務器發起推理請求
from openai import OpenAI
from transformers import AutoTokenizer
from model_utils.io_utils import prepare_input, derive_step_rewards_vllm
prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n - She eats 3 eggs for breakfast every morning.\n - She bakes muffins for her friends every day with 4 eggs.\n So, the total number of eggs used per day is:\n \\[\n 3 + 4 = 7 \\text{ eggs}\n \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n Subtract the number of eggs used from the total number of eggs laid:\n \\[\n 16 - 7 = 9 \\text{ eggs}\n \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n She sells each egg for $2, so the total amount of money she makes is:\n \\[\n 9 \\times 2 = 18 \\text{ dollars}\n \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
},
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n - Janet eats 3 eggs for breakfast.\n - She bakes 4 eggs for muffins.\n - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n - Total eggs laid: 16\n - Eggs used: 7\n - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n - Price per egg: $2\n - Number of eggs sold: 9\n - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
}
]
# 數據預處理
processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
input_ids, steps, reward_flags = zip(*processed_data)
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8081/v1"
client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
rewards = client.embeddings.create(
input=input_ids,
model=model,
)
step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
print("step_rewards:",step_rewards[0])
print("step_rewards:",step_rewards[1])
✨ 主要特性
- 創新推理能力:融入類似o1的慢思考與推理能力,在輸出中展現出思考、規劃和反思能力,顯著提升推理技能。
- 多模型選擇:包含三款先進模型,分別基於不同基礎模型訓練,滿足不同規模和複雜度的推理任務需求。
- 性能卓越:在標準基準測試中,推理能力有顯著提升,推動基礎模型向推理任務的先進水平邁進。
📦 安裝指南
Hugging Face推理
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
vllm服務器推理
pip install vllm==v0.6.4.post1
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
pip install -e .
📚 詳細文檔
模型介紹
Skywork o1 Open模型系列由崑崙萬維Skywork團隊開發,包括三款先進模型:
- Skywork o1 Open-Llama-3.1-8B:基於Llama-3.1-8B訓練的強大聊天模型,通過“o1風格”數據顯著增強推理能力。
- Skywork o1 Open-PRM-Qwen-2.5-1.5B:專為通過增量過程獎勵提升推理能力而設計的專業模型,適用於小規模複雜問題解決。
- Skywork o1 Open-PRM-Qwen-2.5-7B:在1.5B模型基礎上擴展,能夠處理更具挑戰性的推理任務。
模型信息
Skywork-o1-Open-PRM系列基於 Qwen2.5-Math-1.5B-Instruct 和 Qwen2.5-Math-7B-Instruct 訓練。
PRM評估
評估設置
- 數學評估:使用 Qwen2.5-Math 的評估腳本並遵循其配置,選擇的數據集包括 GSM8K、MATH、GaoKao、CN-Middle School 24、OlympiadBench、AMC-23 和 AIME-24。其中,GaoKao 和 CN-Middle School 24 為中文數據集,其餘為英文數據集。OlympiadBench、AIME-24 和 AMC-23 為競賽級數據集。
- 代碼評估:採用 Qwen2.5-Coder 的評估腳本並保持相同配置,選擇的數據集包括 HumanEval、MBPP 和 LiveCodeBench,其中 LiveCodeBench 採用 2024.01 - 2024 - 11 版本。由於先前版本在測試和代碼清理方面存在問題,使用 evalplus 的最新版本(0.3.1)。
評估基礎模型
評估RM在三款基礎模型上的性能:Qwen2.5-7B-Instruct、Llama3.1-8B-Instruct 和 Skywork-o1-Open-8B。進行數據採樣以驗證RM在不同模型上的性能,數學問題採樣溫度設置為 0.7,代碼相關任務採樣溫度設置為 1.0。
對比RM
- Qwen2.5-Math-RM-72B:由Qwen團隊提供的開源ORM。
- OpenR-MATH-psa-PRM-7B:來自OpenR項目的開源PRM。
- RLHFlow-Deepseek-Data-PRM-8B:來自RLHFlow項目的開源PRM。
評估指標
- Greedy Sampling Pass@1:使用貪心採樣生成答案。
- Majority Voting@64:隨機採樣64個答案。
- Best-of-N@64:根據獎勵模型(RM)提供的輸出值對64個答案進行排序。ORM和PRM的加權方法不同:
- ORM:僅使用最後一步的獎勵。
- PRM:使用所有步驟的平均獎勵進行加權。
評估結果
數學評估
基礎模型 | 獎勵模型 | 方法 | GSM8K | MATH | GaoKao | CN-Middle School 24 | OlympiadBench | AIME-24 | AMC-23 | 平均 |
---|---|---|---|---|---|---|---|---|---|---|
Skywork-o1-Open-8B | N/A | Greedy Sampling Pass@1 | 91.6 | 78.1 | 63.6 | 67.3 | 43.1 | 13.3 | 55.0 | 58.9 |
Skywork-o1-Open-8B | N/A | Majority Voting@64 | 93.9 | 84.3 | 69.5 | 73.3 | 50.4 | 16.7 | 52.5 | 62.9 |
Skywork-o1-Open-8B | OpenR-MATH-psa-PRM-7B | Best-of-N@64 | 95.1 | 82.7 | 67.1 | 70.3 | 47.6 | 20.0 | 57.5 | 62.9 |
Skywork-o1-Open-8B | RLHFlow-Deepseek-Data-PRM-8B | Best-of-N@64 | 94.4 | 80.1 | 59.1 | 74.3 | 43.0 | 20.0 | 50.0 | 60.1 |
Skywork-o1-Open-8B | Qwen2.5-Math-RM-72B | Best-of-N@64 | 96.1 | 86.9 | 76.0 | 76.2 | 53.3 | 26.7 | 65.0 | 68.6 |
Skywork-o1-Open-8B | Skywork-o1-Open-PRM-1.5B | Best-of-N@64 | 94.5 | 85.0 | 65.6 | 73.3 | 49.9 | 16.7 | 62.5 | 63.9 |
Skywork-o1-Open-8B | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 96.7 | 87.0 | 70.3 | 76.2 | 52.3 | 23.3 | 65.0 | 67.3 |
Qwen2.5-7B-Instruct | N/A | Greedy Sampling Pass@1 | 91.9 | 75.2 | 55.6 | 75.2 | 39.1 | 13.3 | 45.0 | 56.5 |
Qwen2.5-7B-Instruct | N/A | Majority Voting@64 | 93.5 | 78.4 | 55.3 | 78.2 | 40.1 | 13.3 | 62.5 | 60.2 |
Qwen2.5-7B-Instruct | OpenR-MATH-psa-PRM-7B | Best-of-N@64 | 93.9 | 77.9 | 52.4 | 73.3 | 40.7 | 10.0 | 55.0 | 57.6 |
Qwen2.5-7B-Instruct | RLHFlow-Deepseek-Data-PRM-8B | Best-of-N@64 | 94.1 | 78.1 | 53.2 | 75.2 | 39.1 | 16.7 | 55.0 | 58.8 |
Qwen2.5-7B-Instruct | Qwen2.5-Math-RM-72B | Best-of-N@64 | 94.8 | 82.4 | 65.2 | 80.2 | 45.0 | 13.3 | 62.5 | 63.4 |
Qwen2.5-7B-Instruct | Skywork-o1-Open-PRM-1.5B | Best-of-N@64 | 93.3 | 79.8 | 56.1 | 74.3 | 43.9 | 10.0 | 67.5 | 60.7 |
Qwen2.5-7B-Instruct | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 94.9 | 81.9 | 56.3 | 75.2 | 44.9 | 13.3 | 65.0 | 61.6 |
Llama3.1-8B-Instruct | N/A | Greedy Sampling Pass@1 | 85.3 | 49.7 | 25.3 | 47.5 | 16.6 | 6.7 | 27.5 | 36.9 |
Llama3.1-8B-Instruct | N/A | Majority Voting@64 | 90.9 | 62.9 | 28.0 | 56.4 | 26.4 | 13.3 | 37.5 | 45.1 |
Llama3.1-8B-Instruct | OpenR-MATH-psa-PRM-7B | Best-of-N@64 | 91.8 | 59.4 | 24.7 | 47.5 | 23.0 | 13.3 | 35.0 | 42.1 |
Llama3.1-8B-Instruct | RLHFlow-Deepseek-Data-PRM-8B | Best-of-N@64 | 89.8 | 56.1 | 24.0 | 40.6 | 20.4 | 0.0 | 35.0 | 38.0 |
Llama3.1-8B-Instruct | Qwen2.5-Math-RM-72B | Best-of-N@64 | 94.9 | 72.5 | 44.9 | 65.3 | 34.4 | 23.3 | 60.0 | 56.5 |
Llama3.1-8B-Instruct | Skywork-o1-Open-PRM-1.5B | Best-of-N@64 | 91.7 | 65.6 | 26.8 | 49.5 | 27.0 | 16.7 | 60.0 | 48.2 |
Llama3.1-8B-Instruct | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 94.0 | 69.8 | 32.0 | 56.4 | 29.9 | 16.7 | 52.5 | 50.2 |
代碼評估
由於對比的PRM未在代碼相關任務上進行訓練,本節僅關注Skywork-o1-Open-PRM的性能。
基礎模型 | 獎勵模型 | 方法 | MBPP | MBPP+ | HumanEval | HumanEval+ | LiveCodeBench-2024.01-2024-11 |
---|---|---|---|---|---|---|---|
Skywork-o1-Open-8B | N/A | Greedy Sampling Pass@1 | 79.9 | 65.9 | 82.9 | 78.7 | 26.0 |
Skywork-o1-Open-8B | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 81.2 | 68.5 | 81.1 | 74.4 | 31.3 |
Qwen2.5-7B-Instruct | N/A | Greedy Sampling Pass@1 | 81.7 | 69.3 | 84.8 | 78.0 | 25.3 |
Qwen2.5-7B-Instruct | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 84.9 | 72.5 | 83.5 | 78.0 | 30.7 |
Llama3.1-8B-Instruct | N/A | Greedy Sampling Pass@1 | 75.7 | 64.0 | 68.3 | 63.4 | 16.1 |
Llama3.1-8B-Instruct | Skywork-o1-Open-PRM-7B | Best-of-N@64 | 77.8 | 66.1 | 73.8 | 67.7 | 23.3 |
🔧 技術細節
- 評估腳本:數學評估使用 Qwen2.5-Math 的評估腳本,代碼評估採用 Qwen2.5-Coder 的評估腳本。
- 數據採樣:在不同基礎模型上進行數據採樣,數學問題採樣溫度設置為 0.7,代碼相關任務採樣溫度設置為 1.0。
- 獎勵模型加權:ORM僅使用最後一步的獎勵,PRM使用所有步驟的平均獎勵進行加權。
📄 許可證
Skywork模型的社區使用需要遵循Skywork社區許可證。該模型支持商業使用,若計劃將Skywork模型或其衍生產品用於商業目的,必須遵守Skywork社區許可證中的條款和條件。
免責聲明
我們聲明,Skywork模型不得用於任何危害國家或社會安全的活動,或從事非法行為。此外,我們要求用戶在未進行適當的安全審查和記錄的情況下,不要將Skywork模型部署到互聯網服務中。我們希望所有用戶遵守這一原則,確保技術在規範和合法的環境中發展。
我們已盡力確保模型訓練過程中使用的數據合規,但由於模型和數據的複雜性,仍可能存在不可預測的風險和問題。因此,若因使用Skywork開源模型而出現任何問題,包括但不限於數據安全問題、輿論風險,或因模型被誤導、濫用、傳播或不當使用而產生的任何風險和問題,我們將不承擔任何責任。
引用
如果您認為我們的工作有幫助,請使用以下BibTeX條目引用我們:
@misc{skyworkopeno12024,
title={Skywork-o1 Open Series},
author={He, Jujie and Wei, Tianwen and Yan, Rui and Liu, Jiacai and Wang, Chaojie and Gan, Yimeng and Tu, Shiwen and Liu, Chris Yuhao and Zeng, Liang and Wang, Xiaokun and Wang, Boyang and Li, Yongcong and Zhang, Fuxiang and Xu, Jiacheng and An, Bo and Liu, Yang and Zhou, Yahui},
year={2024},
month={November},
howpublished={\url{https://huggingface.co/Skywork}},
url={https://huggingface.co/Skywork},
}
TODO
- 增加更多逐步RM評估方法的結果。
- 調整RM架構,增強與vLLM/sglang推理的兼容性。
- 通過納入更多類型的推理任務,擴展RM的使用場景。
- 緩解不同推理任務之間的性能衝突。
聯繫我們
如果您有任何問題,請隨時通過 {jujie.he, jiacai.liu}@kunlun-inc.com 與我們聯繫。



