Model Overview
Model Features
Model Capabilities
Use Cases
đ Skywork o1 Open Model Series
The Skywork o1 Open model series, developed by the Skywork team at Kunlun Inc, brings in models with o1 - like slow - thinking and reasoning capabilities. This series aims to enhance AI's reasoning skills and move towards the state - of - the - art in related tasks.
⨠Features
- Incorporates o1 - like slow thinking and reasoning capabilities.
- Exhibits innate thinking, planning, and reflecting capabilities in outputs.
- Shows significant improvements in reasoning skills on standard benchmarks.
đĻ Installation
No installation steps are provided in the original README.
đģ Usage Examples
Basic Usage
If you want to use the PRM, refer to the following steps:
- Clone the Skywork PRM inference repository:
git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference
- Run PRM inference:
from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards
prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n - She eats 3 eggs for breakfast every morning.\n - She bakes muffins for her friends every day with 4 eggs.\n So, the total number of eggs used per day is:\n \\[\n 3 + 4 = 7 \\text{ eggs}\n \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n Subtract the number of eggs used from the total number of eggs laid:\n \\[\n 16 - 7 = 9 \\text{ eggs}\n \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n She sells each egg for $2, so the total amount of money she makes is:\n \\[\n 9 \\times 2 = 18 \\text{ dollars}\n \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
},
{
"problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"respo"
}
]
đ Documentation
Introduction
The Skywork o1 Open model series includes three advanced models:
- [Skywork o1 Open - Llama - 3.1 - 8B](https://huggingface.co/Skywork/Skywork - o1 - Open - Llama3.1 - 8B): A chat model based on Llama - 3.1 - 8B, enhanced with "o1 - style" data for better reasoning.
- [Skywork o1 Open - PRM - Qwen - 2.5 - 1.5B](https://huggingface.co/Skywork/Skywork - o1 - Open - PRM - Qwen2.5 - 1.5B): A specialized model for enhancing reasoning through incremental process rewards, suitable for small - scale complex problem - solving.
- [Skywork o1 Open - PRM - Qwen - 2.5 - 7B](https://huggingface.co/Skywork/Skywork - o1 - Open - PRM - Qwen2.5 - 7B): Scaled - up version of the 1.5B model for more demanding reasoning tasks.
Model Information
The Skywork - o1 - Open - PRM series are trained on [Qwen2.5 - Math - 1.5B - Instruct](https://huggingface.co/Qwen/Qwen2.5 - Math - 1.5B - Instruct) and [Qwen2.5 - Math - 7B - Instruct](https://huggingface.co/Qwen/Qwen2.5 - Math - 7B - Instruct).
PRM Evaluation
Evaluation Settings
- Mathematical Evaluation: Used evaluation scripts from [Qwen2.5 - Math](https://github.com/QwenLM/Qwen2.5 - Math) with the same configuration. Datasets include GSM8K, MATH, GaoKao, CN - Middle School 24, OlympiadBench, AIME - 24, and AMC - 23.
- Code Evaluation: Adopted evaluation scripts from [Qwen2.5 - Coder](https://github.com/QwenLM/Qwen2.5 - Coder) with similar configuration. Datasets include HumanEval, MBPP, and LiveCodeBench (version 2024.01 - 2024 - 11).
Evaluation Base Models
Evaluated RMs on three base models: Qwen2.5 - 7B - Instruct, Llama3.1 - 8B - Instruct, and Skywork - o1 - Open - 8B. Sampling temperature was set to 0.7 for math problems and 1.0 for code tasks.
Compared RMs
- [Qwen2.5 - Math - RM - 72B](https://huggingface.co/Qwen/Qwen2.5 - Math - RM - 72B): An open - source ORM by the Qwen team.
- [OpenR - MATH - psa - PRM - 7B](https://huggingface.co/openreasoner/Math - psa): An open - source PRM from the OpenR project.
- [RLHFlow - Deepseek - Data - PRM - 8B](https://huggingface.co/RLHFlow/Llama3.1 - 8B - PRM - Deepseek - Data): An open - source PRM from the RLHFlow project.
Evaluation Metrics
- Greedy Sampling Pass@1: Uses greedy sampling for answer generation.
- Majority Voting@64: Randomly samples 64 answers.
- Best - of - N@64: Ranks 64 answers based on RM output values. ORM uses only the final - step reward, while PRM uses the average reward across all steps.
Evaluation Results
Mathematical Evaluation
Model | Reward Model | Method | GSM8K | MATH | GaoKao | CN - Middle School 24 | OlympiadBench | AIME - 24 | AMC - 23 | Avg |
---|---|---|---|---|---|---|---|---|---|---|
Skywork - o1 - Open - 8B | N/A | Greedy Sampling Pass@1 | 91.6 | 78.1 | 63.6 | 67.3 | 43.1 | 13.3 | 55.0 | 58.9 |
Skywork - o1 - Open - 8B | N/A | Majority Voting@64 | 93.9 | 84.3 | 69.5 | 73.3 | 50.4 | 16.7 | 52.5 | 62.9 |
Skywork - o1 - Open - 8B | OpenR - MATH - psa - PRM - 7B | Best - of - N@64 | 95.1 | 82.7 | 67.1 | 70.3 | 47.6 | 20.0 | 57.5 | 62.9 |
Skywork - o1 - Open - 8B | RLHFlow - Deepseek - Data - PRM - 8B | Best - of - N@64 | 94.4 | 80.1 | 59.1 | 74.3 | 43.0 | 20.0 | 50.0 | 60.1 |
Skywork - o1 - Open - 8B | Qwen2.5 - Math - RM - 72B | Best - of - N@64 | 96.1 | 86.9 | 76.0 | 76.2 | 53.3 | 26.7 | 65.0 | 68.6 |
Skywork - o1 - Open - 8B | Skywork - o1 - Open - PRM - 1.5B | Best - of - N@64 | 94.5 | 85.0 | 65.6 | 73.3 | 49.9 | 16.7 | 62.5 | 63.9 |
Skywork - o1 - Open - 8B | Skywork - o1 - Open - PRM - 7B | Best - of - N@64 | 96.7 | 87.0 | 70.3 | 76.2 | 52.3 | 23.3 | 65.0 | 67.3 |
Qwen2.5 - 7B - Instruct | N/A | Greedy Sampling Pass@1 | 91.9 | 75.2 | 55.6 | 75.2 | 39.1 | 13.3 | 45.0 | 56.5 |
Qwen2.5 - 7B - Instruct | N/A | Majority Voting@64 | 93.5 | 78.4 | 55.3 | 78.2 | 40.1 | 13.3 | 62.5 | 60.2 |
Qwen2.5 - 7B - Instruct | OpenR - MATH - psa - PRM - 7B | Best - of - N@64 | 93.9 | 77.9 | 52.4 | 73.3 | 40.7 | 10.0 | 55.0 | 57.6 |
Qwen2.5 - 7B - Instruct | RLHFlow - Deepseek - Data - PRM - 8B | Best - of - N@64 | 94.1 | 78.1 | 53.2 | 75.2 | 39.1 | 16.7 | 55.0 | 58.8 |
Qwen2.5 - 7B - Instruct | Qwen2.5 - Math - RM - 72B | Best - of - N@64 | 94.8 | 82.4 | 65.2 | 80.2 | 45.0 | 13.3 | 62.5 | 63.4 |
Qwen2.5 - 7B - Instruct | Skywork - o1 - Open - PRM - 1.5B | Best - of - N@64 | 93.3 | 79.8 | 56.1 | 74.3 | 43.9 | 10.0 | 67.5 | 60.7 |
Qwen2.5 - 7B - Instruct | Skywork - o1 - Open - PRM - 7B | Best - of - N@64 | 94.9 | 81.9 | 56.3 | 75.2 | 44.9 | 13.3 | 65.0 | 61.6 |
Llama3.1 - 8B - Instruct | N/A | Greedy Sampling Pass@1 | 85.3 | 49.7 | 25.3 | 47.5 | 16.6 | 6.7 | 27.5 | 36.9 |
Llama3.1 - 8B - Instruct | N/A | Majority Voting@64 | 90.9 | 62.9 | 28.0 | 56.4 | 26.4 | 13.3 | 37.5 | 45.1 |
Llama3.1 - 8B - Instruct | OpenR - MATH - psa - PRM - 7B | Best - of - N@64 | 91.8 | 59.4 | 24.7 | 47.5 | 23.0 | 13.3 | 35.0 | 42.1 |
Llama3.1 - 8B - Instruct | RLHFlow - Deepseek - Data - PRM - 8B | Best - of - N@64 | 89.8 | 56.1 | 24.0 | 40.6 | 20.4 | 0.0 | 35.0 | 38.0 |
Llama3.1 - 8B - Instruct | Qwen2.5 - Math - RM - 72B | Best - of - N@64 | 94.9 | 72.5 | 44.9 | 65.3 | 34.4 | 23.3 | 60.0 | 56.5 |
Llama3.1 - 8B - Instruct | Skywork - o1 - Open - PRM - 1.5B | Best - of - N@64 | 91.7 | 65.6 | 26.8 | 49.5 | 27.0 | 16.7 | 60.0 | 48.2 |
Llama3.1 - 8B - Instruct | Skywork - o1 - Open - PRM - 7B | Best - of - N@64 | 94.0 | 69.8 | 32.0 | 56.4 | 29.9 | 16.7 | 52.5 | 50.2 |
Code Evaluation
Model | Reward Model | Method | MBPP | MBPP+ | HumanEval | HumanEval+ | LiveCodeBench - 2024.01 - 2024 - 11 |
---|---|---|---|---|---|---|---|
Skywork - o1 - Open - 8B | N/A | Greedy Sampling Pass@1 | 79.9 | 65.9 | 82.9 | 78.7 | 26.0 |
Skywork - o1 - Open - 8B | Skywork - o1 - Open - PRM - 7B | Best - of - N@64 | 81.2 | 68.5 | 81.1 | 74.4 | 31.3 |
Qwen2.5 - 7B - Instruct | N/A | Greedy Sampling Pass@1 | 81.7 | 69.3 | 84.8 | 78.0 | 25.3 |
Qwen2.5 - 7B - Instruct | Skywork - o1 - Open - PRM - 7B | Best - of - N@64 | 84.9 | 72.5 | 83.5 | 78.0 | 30.7 |
Llama3.1 - 8B - Instruct | N/A | Greedy Sampling Pass@1 | 75.7 | 64.0 | 68.3 | 63.4 | 16.1 |
Llama3.1 - 8B - Instruct | Skywork - o1 - Open - PRM - 7B | Best - of - N@64 | 77.8 | 66.1 | 73.8 | 67.7 | 23.3 |
Quick Start
For detailed PRM usage instructions, refer to the [Skywork - o1 - PRM - Inference Repository](https://github.com/SkyworkAI/skywork - o1 - prm - inference).
đ License
The license is "other".
đ§ Technical Details
No specific technical details are provided in the original README.
đ Links
- đ¤ Hugging Face
- đ¤ ModelScope

