Skywork-o1-Open-PRM-Qwen-2.5-7B Open Source Model - Supports Progressive Inference, a Practical and Free Deployment Choice

Skywork O1 Open PRM Qwen 2.5 7B

Developed by Skywork

A 7B-parameter model in the Skywork o1 open model series, trained on Qwen2.5-Math-7B-Instruct, featuring enhanced reasoning capabilities through progressive process rewards

Large Language Model

PyTorch

Open Source License:Other #Enhanced Mathematical Reasoning #Progressive Reward Model #Competition-Level Problem Solving

Downloads 3,608

Release Time : 11/26/2024

Model Overview

This model is part of the o1 open model series developed by the Skywork team, with reasoning capabilities enhanced by a progressive process reward mechanism, suitable for handling demanding complex problem-solving tasks.

Model Features

Progressive Process Reward Mechanism

Enhances the reliability of the reasoning process through PRM (Process Reward Model) technology

Enhanced Mathematical Reasoning

Excellent performance on mathematical datasets such as GSM8K and MATH

Competition-Level Problem Solving

Capable of handling competition-level mathematical problems like AMC and AIME

Model Capabilities

Mathematical problem solving

Code generation

Multi-turn dialogue

Complex reasoning

Competition-level mathematical problem solving

Use Cases

Education

Math Tutoring

Helps students solve mathematical problems ranging from basic to competition level

Excellent performance on GSM8K and MATH datasets

Programming Assistance

Code Generation and Optimization

Generates code based on natural language descriptions

Good performance on HumanEval and MBPP datasets

🚀 Skywork o1 Open Model Series

The Skywork o1 Open model series, developed by the Skywork team at Kunlun Inc, brings in models with o1 - like slow - thinking and reasoning capabilities. This series aims to enhance AI's reasoning skills and move towards the state - of - the - art in related tasks.

✨ Features

Incorporates o1 - like slow thinking and reasoning capabilities.
Exhibits innate thinking, planning, and reflecting capabilities in outputs.
Shows significant improvements in reasoning skills on standard benchmarks.

📦 Installation

No installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

If you want to use the PRM, refer to the following steps:

Clone the Skywork PRM inference repository:

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
cd skywork-o1-prm-inference

Run PRM inference:

from transformers import AutoTokenizer
from model_utils.prm_model import PRM_MODEL
from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards

prm_model_path = "/path/to/prm_model"
tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
datas = [
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n   Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n   - She eats 3 eggs for breakfast every morning.\n   - She bakes muffins for her friends every day with 4 eggs.\n   So, the total number of eggs used per day is:\n   \\[\n   3 + 4 = 7 \\text{ eggs}\n   \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n   Subtract the number of eggs used from the total number of eggs laid:\n   \\[\n   16 - 7 = 9 \\text{ eggs}\n   \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n   She sells each egg for $2, so the total amount of money she makes is:\n   \\[\n   9 \\times 2 = 18 \\text{ dollars}\n   \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
    },
    {
        "problem"  : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
        "respo"
    }
]

📚 Documentation

Introduction

The Skywork o1 Open model series includes three advanced models:

[Skywork o1 Open - Llama - 3.1 - 8B](https://huggingface.co/Skywork/Skywork - o1 - Open - Llama3.1 - 8B): A chat model based on Llama - 3.1 - 8B, enhanced with "o1 - style" data for better reasoning.
[Skywork o1 Open - PRM - Qwen - 2.5 - 1.5B](https://huggingface.co/Skywork/Skywork - o1 - Open - PRM - Qwen2.5 - 1.5B): A specialized model for enhancing reasoning through incremental process rewards, suitable for small - scale complex problem - solving.
[Skywork o1 Open - PRM - Qwen - 2.5 - 7B](https://huggingface.co/Skywork/Skywork - o1 - Open - PRM - Qwen2.5 - 7B): Scaled - up version of the 1.5B model for more demanding reasoning tasks.

Model Information

The Skywork - o1 - Open - PRM series are trained on [Qwen2.5 - Math - 1.5B - Instruct](https://huggingface.co/Qwen/Qwen2.5 - Math - 1.5B - Instruct) and [Qwen2.5 - Math - 7B - Instruct](https://huggingface.co/Qwen/Qwen2.5 - Math - 7B - Instruct).

PRM Evaluation

Evaluation Settings

Mathematical Evaluation: Used evaluation scripts from [Qwen2.5 - Math](https://github.com/QwenLM/Qwen2.5 - Math) with the same configuration. Datasets include GSM8K, MATH, GaoKao, CN - Middle School 24, OlympiadBench, AIME - 24, and AMC - 23.
Code Evaluation: Adopted evaluation scripts from [Qwen2.5 - Coder](https://github.com/QwenLM/Qwen2.5 - Coder) with similar configuration. Datasets include HumanEval, MBPP, and LiveCodeBench (version 2024.01 - 2024 - 11).

Evaluation Base Models

Evaluated RMs on three base models: Qwen2.5 - 7B - Instruct, Llama3.1 - 8B - Instruct, and Skywork - o1 - Open - 8B. Sampling temperature was set to 0.7 for math problems and 1.0 for code tasks.

Compared RMs

[Qwen2.5 - Math - RM - 72B](https://huggingface.co/Qwen/Qwen2.5 - Math - RM - 72B): An open - source ORM by the Qwen team.
[OpenR - MATH - psa - PRM - 7B](https://huggingface.co/openreasoner/Math - psa): An open - source PRM from the OpenR project.
[RLHFlow - Deepseek - Data - PRM - 8B](https://huggingface.co/RLHFlow/Llama3.1 - 8B - PRM - Deepseek - Data): An open - source PRM from the RLHFlow project.

Evaluation Metrics

Greedy Sampling Pass@1: Uses greedy sampling for answer generation.
Majority Voting@64: Randomly samples 64 answers.
Best - of - N@64: Ranks 64 answers based on RM output values. ORM uses only the final - step reward, while PRM uses the average reward across all steps.

Evaluation Results

Mathematical Evaluation

Model	Reward Model	Method	GSM8K	MATH	GaoKao	CN - Middle School 24	OlympiadBench	AIME - 24	AMC - 23	Avg
Skywork - o1 - Open - 8B	N/A	Greedy Sampling Pass@1	91.6	78.1	63.6	67.3	43.1	13.3	55.0	58.9
Skywork - o1 - Open - 8B	N/A	Majority Voting@64	93.9	84.3	69.5	73.3	50.4	16.7	52.5	62.9
Skywork - o1 - Open - 8B	OpenR - MATH - psa - PRM - 7B	Best - of - N@64	95.1	82.7	67.1	70.3	47.6	20.0	57.5	62.9
Skywork - o1 - Open - 8B	RLHFlow - Deepseek - Data - PRM - 8B	Best - of - N@64	94.4	80.1	59.1	74.3	43.0	20.0	50.0	60.1
Skywork - o1 - Open - 8B	Qwen2.5 - Math - RM - 72B	Best - of - N@64	96.1	86.9	76.0	76.2	53.3	26.7	65.0	68.6
Skywork - o1 - Open - 8B	Skywork - o1 - Open - PRM - 1.5B	Best - of - N@64	94.5	85.0	65.6	73.3	49.9	16.7	62.5	63.9
Skywork - o1 - Open - 8B	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	96.7	87.0	70.3	76.2	52.3	23.3	65.0	67.3
Qwen2.5 - 7B - Instruct	N/A	Greedy Sampling Pass@1	91.9	75.2	55.6	75.2	39.1	13.3	45.0	56.5
Qwen2.5 - 7B - Instruct	N/A	Majority Voting@64	93.5	78.4	55.3	78.2	40.1	13.3	62.5	60.2
Qwen2.5 - 7B - Instruct	OpenR - MATH - psa - PRM - 7B	Best - of - N@64	93.9	77.9	52.4	73.3	40.7	10.0	55.0	57.6
Qwen2.5 - 7B - Instruct	RLHFlow - Deepseek - Data - PRM - 8B	Best - of - N@64	94.1	78.1	53.2	75.2	39.1	16.7	55.0	58.8
Qwen2.5 - 7B - Instruct	Qwen2.5 - Math - RM - 72B	Best - of - N@64	94.8	82.4	65.2	80.2	45.0	13.3	62.5	63.4
Qwen2.5 - 7B - Instruct	Skywork - o1 - Open - PRM - 1.5B	Best - of - N@64	93.3	79.8	56.1	74.3	43.9	10.0	67.5	60.7
Qwen2.5 - 7B - Instruct	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	94.9	81.9	56.3	75.2	44.9	13.3	65.0	61.6
Llama3.1 - 8B - Instruct	N/A	Greedy Sampling Pass@1	85.3	49.7	25.3	47.5	16.6	6.7	27.5	36.9
Llama3.1 - 8B - Instruct	N/A	Majority Voting@64	90.9	62.9	28.0	56.4	26.4	13.3	37.5	45.1
Llama3.1 - 8B - Instruct	OpenR - MATH - psa - PRM - 7B	Best - of - N@64	91.8	59.4	24.7	47.5	23.0	13.3	35.0	42.1
Llama3.1 - 8B - Instruct	RLHFlow - Deepseek - Data - PRM - 8B	Best - of - N@64	89.8	56.1	24.0	40.6	20.4	0.0	35.0	38.0
Llama3.1 - 8B - Instruct	Qwen2.5 - Math - RM - 72B	Best - of - N@64	94.9	72.5	44.9	65.3	34.4	23.3	60.0	56.5
Llama3.1 - 8B - Instruct	Skywork - o1 - Open - PRM - 1.5B	Best - of - N@64	91.7	65.6	26.8	49.5	27.0	16.7	60.0	48.2
Llama3.1 - 8B - Instruct	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	94.0	69.8	32.0	56.4	29.9	16.7	52.5	50.2

Code Evaluation

Model	Reward Model	Method	MBPP	MBPP+	HumanEval	HumanEval+	LiveCodeBench - 2024.01 - 2024 - 11
Skywork - o1 - Open - 8B	N/A	Greedy Sampling Pass@1	79.9	65.9	82.9	78.7	26.0
Skywork - o1 - Open - 8B	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	81.2	68.5	81.1	74.4	31.3
Qwen2.5 - 7B - Instruct	N/A	Greedy Sampling Pass@1	81.7	69.3	84.8	78.0	25.3
Qwen2.5 - 7B - Instruct	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	84.9	72.5	83.5	78.0	30.7
Llama3.1 - 8B - Instruct	N/A	Greedy Sampling Pass@1	75.7	64.0	68.3	63.4	16.1
Llama3.1 - 8B - Instruct	Skywork - o1 - Open - PRM - 7B	Best - of - N@64	77.8	66.1	73.8	67.7	23.3

Quick Start

For detailed PRM usage instructions, refer to the [Skywork - o1 - PRM - Inference Repository](https://github.com/SkyworkAI/skywork - o1 - prm - inference).

📄 License

The license is "other".

🔧 Technical Details

No specific technical details are provided in the original README.

🔗 Links

🤗 Hugging Face
🤖 ModelScope

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご