Skywork-o1-Open-PRM-Qwen-2.5-1.5B Open Source Model - Enhancing the Ability to Solve Small-Scale Complex Problems

Skywork O1 Open PRM Qwen 2.5 1.5B

Developed by Skywork

Skywork o1 Open-PRM-Qwen-2.5-1.5B is an incremental process reward model trained on Qwen2.5-Math-1.5B-Instruct, specifically designed to enhance small-scale complex problem-solving capabilities.

Large Language Model

PyTorch

Open Source License:Other #Enhanced Mathematical Reasoning #Process Reward Model #Efficient Small-Scale Inference

Downloads 4,368

Release Time : 11/26/2024

Model Overview

This model is part of the Skywork o1 open model series, focusing on enhancing reasoning abilities through incremental process rewards, particularly suitable for math and code-related reasoning tasks.

Model Features

Incremental Process Reward

Enhances reasoning capabilities through PRM mechanisms, particularly suitable for complex problem-solving.

Small-Scale Efficiency

Achieves excellent reasoning performance with a 1.5B parameter scale.

Mathematical Reasoning Ability

Performs exceptionally well on math datasets such as GSM8K and MATH.

Model Capabilities

Mathematical problem solving

Code generation

Complex reasoning

Multi-step problem solving

Use Cases

Education

Math Problem Solving

Helps students solve math problems by providing detailed step-by-step solutions.

Achieves 94.5% accuracy on the GSM8K dataset.

Programming

Code Generation and Optimization

Generates code solutions based on problem descriptions.

Performs well on code datasets such as HumanEval.

🚀 Skywork o1 Open Model Series

The Skywork o1 Open model series, developed by the Skywork team at Kunlun Inc, introduces models with o1 - like slow - thinking and reasoning capabilities, aiming to advance AI reasoning tasks.

🤗 Hugging Face • 🤖 ModelScope

✨ Features

Innovative Reasoning: Incorporates o1 - like slow thinking and reasoning capabilities.
Multiple Model Options: Consists of three advanced models, each with unique strengths for different reasoning tasks.
Improved Performance: Shows significant improvements in reasoning skills on standard benchmarks.

📦 Installation

No installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

For detailed instructions on using PRM, please refer to the Skywork - o1 - PRM - Inference Repository.

Advanced Usage

huggingface inference

Clone the Skywork PRM inference repo:

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

Run PRM inference:

from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "res"
    }
]

📚 Documentation

Introduction

We are excited to announce the release of the Skywork o1 Open model series, developed by the Skywork team at Kunlun Inc. This groundbreaking release introduces a series of models that incorporate o1 - like slow thinking and reasoning capabilities. The Skywork o1 Open model series includes three advanced models:

[Skywork o1 Open - Llama - 3.1 - 8B](https://huggingface.co/Skywork/Skywork - o1 - Open - Llama3.1 - 8B): A robust chat model trained on Llama - 3.1 - 8B, enhanced significantly with "o1 - style" data to improve reasoning skills.
[Skywork o1 Open - PRM - Qwen - 2.5 - 1.5B](https://huggingface.co/Skywork/Skywork - o1 - Open - PRM - Qwen2.5 - 1.5B): A specialized model designed to enhance reasoning capability through incremental process rewards, ideal for complex problem solving at a smaller scale.
[Skywork o1 Open - PRM - Qwen - 2.5 - 7B](https://huggingface.co/Skywork/Skywork - o1 - Open - PRM - Qwen2.5 - 7B): Extends the capabilities of the 1.5B model by scaling up to handle more demanding reasoning tasks, pushing the boundaries of AI reasoning.

Different from mere reproductions of the OpenAI o1 model, the Skywork o1 Open model series not only exhibits innate thinking, planning, and reflecting capabilities in its outputs, but also shows significant improvements in reasoning skills on standard benchmarks. This series represents a strategic advancement in AI capabilities, moving a previously weaker base model towards the state - of - the - art (SOTA) in reasoning tasks.

If you are interested in the Skywork o1 Open model series, please check out the [o1 - llama - 3.1 - 8b](https://huggingface.co/Skywork/o1 - llama - 3.1 - 8b) model.

Model Information

The Skywork - o1 - Open - PRM series are trained on [Qwen2.5 - Math - 1.5B - Instruct](https://huggingface.co/Qwen/Qwen2.5 - Math - 1.5B - Instruct) and [Qwen2.5 - Math - 7B - Instruct](https://huggingface.co/Qwen/Qwen2.5 - Math - 7B - Instruct).

PRM Evaluation

Evaluation Settings

Mathematical Evaluation

We utilized the evaluation scripts from [Qwen2.5 - Math](https://github.com/QwenLM/Qwen2.5 - Math) and followed their configuration to ensure consistency. The selected datasets include GSM8K, MATH, GaoKao, CN - Middle School 24, OlympiadBench, AIME - 24, and AMC - 23. Among these, GaoKao and CN - Middle School 24 are Chinese datasets, while the remaining datasets are in English. Notably, OlympiadBench, AIME - 24, and AMC - 23 are competition - level datasets.

Code Evaluation

For code evaluation, we adopted the evaluation scripts from [Qwen2.5 - Coder](https://github.com/QwenLM/Qwen2.5 - Coder) while largely maintaining the same configuration. The selected datasets include HumanEval, MBPP, and LiveCodeBench, with LiveCodeBench specifically using the version 2024.01 - 2024 - 11. We use the latest version (0.3.1) of evalplus due to issues with tests and code sanitization in previous versions.

Evaluation Base Models

We evaluated the performance of RMs on three base models: Qwen2.5 - 7B - Instruct, Llama3.1 - 8B - Instruct, and Skywork - o1 - Open - 8B. Data sampling was conducted to verify the performance of the RMs across different models. The sampling temperature was set to 0.7 for mathematical problems and 1.0 for code - related tasks.

Compared RMs

[Qwen2.5 - Math - RM - 72B](https://huggingface.co/Qwen/Qwen2.5 - Math - RM - 72B): An open - source ORM provided by the Qwen team.
[OpenR - MATH - psa - PRM - 7B](https://huggingface.co/openreasoner/Math - psa): An open - source PRM from the OpenR project.
[RLHFlow - Deepseek - Data - PRM - 8B](https://huggingface.co/RLHFlow/Llama3.1 - 8B - PRM - Deepseek - Data): An open - source PRM from the RLHFlow project.

Evaluation Metrics

Greedy Sampling Pass@1: Uses greedy sampling for generating answers.
Majority Voting@64: Randomly samples 64 answers.
Best - of - N@64: Ranks 64 answers based on output values provided by the Reward Model (RM). The weighting methods differ for ORM and PRM:
- For ORM, only the reward from the final step is used.
- For PRM, the average reward across all steps is used for weighting.

Evaluation Results

Mathematical Evaluation

Model	Reward Model	Method	GSM8K	MATH	GaoKao	CN - Middle School 24	OlympiadBench	AIME - 24	AMC - 23	Avg
Skywork - o1 - Open - 8B	N/A	Greedy Sampling Pass@1	91.6	78.1	63.6	67.3	43.1	13.3	55.0	58.9
Skywork - o1 - Open - 8B	N/A	Majority Voting@64	93.9	84.3	69.5	73.3	50.4	16.7	52.5	62.9
Skywork - o1 - Open - 8B	OpenR - MATH - psa - PRM - 7B	Best - of - N@64	95.1	82.7	67.1	70.3	47.6	20.0	57.5	62.9
Skywork - o1 - Open - 8B	RLHFlow - Deepseek - Data - PRM - 8B	Best - of - N@64	94.4	80.1	59.1	74.3	43.0	20.0	50.0	60.1
Skywork - o1 - Open - 8B	Qwen2.5 - Math - RM - 72B	Best - of - N@64	96.1	86.9	76.0	76.2	53.3	26.7	65.0	68.6
Skywork - o1 - Open - 8B	Skywork - o1 - Open - PRM - 1.5B	Best - of - N@64	94.5	85.0	65.6	73.3	49.9	16.7	62.5	63.9
Skywork - o1 - Open - 8B	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	96.7	87.0	70.3	76.2	52.3	23.3	65.0	67.3
Qwen2.5 - 7B - Instruct	N/A	Greedy Sampling Pass@1	91.9	75.2	55.6	75.2	39.1	13.3	45.0	56.5
Qwen2.5 - 7B - Instruct	N/A	Majority Voting@64	93.5	78.4	55.3	78.2	40.1	13.3	62.5	60.2
Qwen2.5 - 7B - Instruct	OpenR - MATH - psa - PRM - 7B	Best - of - N@64	93.9	77.9	52.4	73.3	40.7	10.0	55.0	57.6
Qwen2.5 - 7B - Instruct	RLHFlow - Deepseek - Data - PRM - 8B	Best - of - N@64	94.1	78.1	53.2	75.2	39.1	16.7	55.0	58.8
Qwen2.5 - 7B - Instruct	Qwen2.5 - Math - RM - 72B	Best - of - N@64	94.8	82.4	65.2	80.2	45.0	13.3	62.5	63.4
Qwen2.5 - 7B - Instruct	Skywork - o1 - Open - PRM - 1.5B	Best - of - N@64	93.3	79.8	56.1	74.3	43.9	10.0	67.5	60.7
Qwen2.5 - 7B - Instruct	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	94.9	81.9	56.3	75.2	44.9	13.3	65.0	61.6
Llama3.1 - 8B - Instruct	N/A	Greedy Sampling Pass@1	85.3	49.7	25.3	47.5	16.6	6.7	27.5	36.9
Llama3.1 - 8B - Instruct	N/A	Majority Voting@64	90.9	62.9	28.0	56.4	26.4	13.3	37.5	45.1
Llama3.1 - 8B - Instruct	OpenR - MATH - psa - PRM - 7B	Best - of - N@64	91.8	59.4	24.7	47.5	23.0	13.3	35.0	42.1
Llama3.1 - 8B - Instruct	RLHFlow - Deepseek - Data - PRM - 8B	Best - of - N@64	89.8	56.1	24.0	40.6	20.4	0.0	35.0	38.0
Llama3.1 - 8B - Instruct	Qwen2.5 - Math - RM - 72B	Best - of - N@64	94.9	72.5	44.9	65.3	34.4	23.3	60.0	56.5
Llama3.1 - 8B - Instruct	Skywork - o1 - Open - PRM - 1.5B	Best - of - N@64	91.7	65.6	26.8	49.5	27.0	16.7	60.0	48.2
Llama3.1 - 8B - Instruct	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	94.0	69.8	32.0	56.4	29.9	16.7	52.5	50.2

Code Evaluation

Since the compared PRMs have not been trained on code - related tasks, this section focuses solely on the performance of Skywork - o1 - Open - PRM.

Model	Reward Model	Method	MBPP	MBPP+	HumanEval	HumanEval+	LiveCodeBench - 2024.01 - 2024 - 11
Skywork - o1 - Open - 8B	N/A	Greedy Sampling Pass@1	79.9	65.9	82.9	78.7	26.0
Skywork - o1 - Open - 8B	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	81.2	68.5	81.1	74.4	31.3
Qwen2.5 - 7B - Instruct	N/A	Greedy Sampling Pass@1	81.7	69.3	84.8	78.0	25.3
Qwen2.5 - 7B - Instruct	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	84.9	72.5	83.5	78.0	30.7
Llama3.1 - 8B - Instruct	N/A	Greedy Sampling Pass@1	75.7	64.0	68.3	63.4	16.1
Llama3.1 - 8B - Instruct	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	77.8	66.1	73.8	67.7	23.3

🔧 Technical Details

No technical details are provided in the original README.

📄 License

The license is "other".

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご