Skywork o1 Open-Llama-3.1-8B Open-Source Dialogue Model - Enhance Inference Ability and Provide High-Quality Q&A Services

Skywork O1 Open Llama 3.1 8B

Developed by Skywork

Skywork o1 Open-Llama-3.1-8B is a powerful conversational model trained on Llama-3.1-8B, significantly enhanced in reasoning capabilities through 'slow thinking' style data.

Large Language Model

Safetensors

Open Source License:Other #Slow Thinking Reasoning #Math Programming Enhancement #Progressive Reward Training

Downloads 583

Release Time : 11/26/2024

Model Overview

An advanced model with o1-like slow thinking reasoning abilities, specializing in solving complex problems in mathematics, programming, and logical reasoning.

Model Features

Slow Thinking Reasoning Ability

Achieves deep thinking, planning, and reflection capabilities through proprietary training methods.

Three-Phase Training Program

Includes reflective reasoning training, reasoning capability reinforcement learning, and reasoning planning.

Q* Online Reasoning Algorithm

An advanced reasoning algorithm implemented and publicly released for the first time, optimizing reasoning paths.

Model Capabilities

Complex Problem Solving

Mathematical Reasoning

Logical Reasoning

Programming Problem Solving

Ethical Decision Analysis

Self-Verification

Use Cases

Education

Math Problem Solving

Solves various mathematical word problems and equations.

Performs excellently on math test sets like GSM8K.

Programming

Algorithm Problem Solving

Analyzes and solves programming algorithm problems.

Outperforms in programming tests like HumanEval.

Logical Analysis

Logical Reasoning Problems

Solves various logical relationship and analogy reasoning problems.

Demonstrates excellent logical analysis capabilities.

🚀 Skywork o1 Open Model Series

The Skywork o1 Open model series, developed by the Skywork team at Kunlun Inc, incorporates o1-like slow thinking and reasoning capabilities. It includes three advanced models, offering enhanced performance in various reasoning tasks and pushing the boundaries of AI reasoning.

🚀 Quick Start

Run Inference

To run inference with Skywork-o1-Open-Llama3.1-8B, use the following code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

system_prompt = """You are Skywork-o1, a thinking model developed by Skywork AI, specializing in solving complex problems involving mathematics, coding, and logical reasoning through deep thought. When faced with a user's request, you first engage in a lengthy and in-depth thinking process to explore possible solutions to the problem. After completing your thoughts, you then provide a detailed explanation of the solution process in your response."""

# An Example Case
problem = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"

user_message = problem

conversation = [
    {
        "role": "system",
        "content": system_prompt
    },
    {
        "role": "user", 
        "content": user_message
    }
]

model_name = "Skywork-o1-Open-Llama3.1-8B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer.apply_chat_template(
    conversation, 
    tokenize=True, 
    add_generation_prompt=True,
    return_tensors="pt").to(model.device)

generation = model.generate(
    input_ids=input_ids,
    max_new_tokens=2048,
    do_sample=False,
    pad_token_id=128009,
    temperature=0)

completion = tokenizer.decode(
    generation[0][len(input_ids[0]):], 
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True)

print(completion)

✨ Features

Model Series Introduction

The Skywork o1 Open model series includes three advanced models:

Skywork o1 Open-Llama-3.1-8B: A robust chat model trained on Llama-3.1-8B, enhanced with "o1-style" data for better reasoning skills.
Skywork o1 Open-PRM-Qwen-2.5-1.5B: A specialized model for enhancing reasoning through incremental process rewards, suitable for small-scale complex problem-solving.
Skywork o1 Open-PRM-Qwen-2.5-7B: Scaled up from the 1.5B model to handle more demanding reasoning tasks.

Key Capabilities

Enhanced model thinking and planning capabilities.
Advanced self-reflection and self-verification abilities.
Adept at handling various reasoning challenges, including common-sense, logical, mathematical, ethical decision-making, and logical trap problems.

Model Performance

The Skywork o1 Open 8B model shows notable improvements in mathematical and coding benchmarks, outperforming prior SOTA models (with a similar size) like Qwen-2.5-7B instruct. Math Benchmark Result Code Benchmark Result

📚 Documentation

Methods

The Skywork o1 Open series' cognitive abilities are developed through a three-stage training scheme:

Reflective Reasoning Training: Use a proprietary multi-agent system to generate high-quality, diverse data for long-thinking tasks, followed by continuous pre-training and supervised fine-tuning.
Reinforcement Learning for Reasoning Capabilities: Introduce the Skywork o1 Process Reward Model (PRM) to enhance step-by-step reasoning. Combine it with proprietary reasoning reinforcement algorithms.
Reasoning Planning: Deploy Tiangong's proprietary Q* online reasoning algorithm alongside model-based thinking to search for optimal reasoning paths.

Highlights

The Skywork o1 Open series stands out with enhanced thinking, planning, self-reflection, and self-verification capabilities. It can handle a variety of reasoning challenges better than previous large models. Demo Case

Reasoning Examples

Here are some examples demonstrating the model's reasoning capabilities:

Mathematical Problem in Chinese

Input: 一项工程，山东队独做需15天完成，江苏队独做需10天完成。两队合作，几天可以完成？

Output: 好的,让我们来解决这个有趣的工程问题。我会用轻松的语气,一步步地分析这个问题。

首先,让我们理解一下题目给出的信息:
1. 山东队独做需要15天完成工程
2. 江苏队独做需要10天完成工程

现在,我们需要找出两队合作时的完成时间。让我们从每队的工作效率开始分析:

1. 山东队的效率:
   15天完成1个工程,所以每天完成的工程量是 1/15

2. 江苏队的效率:
   10天完成1个工程,所以每天完成的工程量是 1/10

稍等,让我们再检查一下这个计算...没错,这个计算是正确的。

现在,两队合作时,每天完成的工程量是两队效率的和:
1/15 + 1/10 = 2/30 + 3/30 = 5/30 = 1/6

这意味着每天两队合作完成的工程量是1/6。

接下来,我们需要计算完成1个工程需要多少天。我们可以用以下等式:
(1/6) × x = 1

解这个等式:
x = 1 ÷ (1/6) = 6

所以,两队合作完成工程需要6天。

让我们再验证一下:
- 6天 × (1/6) = 1
- 这确实等于1,说明我们的计算是正确的。

因此,最终答案是:两队合作完成工程需要6天。

Mathematical Problem in English

Input: Determine the value of $y$ in its simplest form, where $y = \frac{p + q\sqrt{r}}{s}$, given that $\frac{7y}{5} - 2 = \frac{4}{y}$, and $p, q, r, s$ are integers. What is $\frac{pqr}{s}$?

Output: To determine the value of \( y \) in the form \( y = \frac{p + q\sqrt{r}}{s} \), we start with the given equation:

\[
\frac{7y}{5} - 2 = \frac{4}{y}
\]

First, we eliminate the denominators by multiplying both sides of the equation by \( 5y \):

\[
7y^2 - 10y = 20
\]

Next, we rearrange the equation to set it to zero:

\[
7y^2 - 10y - 20 = 0
\]

We solve this quadratic equation using the quadratic formula \( y = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \), where \( a = 7 \), \( b = -10 \), and \( c = -20 \). Plugging in these values, we get:

\[
y = \frac{-(-10) \pm \sqrt{(-10)^2 - 4 \cdot 7 \cdot (-20)}}{2 \cdot 7} = \frac{10 \pm \sqrt{100 + 560}}{14} = \frac{10 \pm \sqrt{660}}{14}
\]

We simplify \( \sqrt{660} \) as follows:

\[
\sqrt{660} = \sqrt{4 \cdot 165} = 2\sqrt{165}
\]

Substituting this back into the quadratic formula, we have:

\[
y = \frac{10 \pm 2\sqrt{165}}{14} = \frac{5 \pm \sqrt{165}}{7}
\]

Since we are asked for the value of \( y \) in the form \( \frac{p + q\sqrt{r}}{s} \), we can choose either the positive or negative root. Let's take the positive root for simplicity:

\[
y = \frac{5 + \sqrt{165}}{7}
\]

Here, \( p = 5 \), \( q = 1 \), \( r = 165 \), and \( s = 7 \). We need to find the value of \( \frac{pqr}{s} \):

\[
\frac{pqr}{s} = \frac{5 \cdot 1 \cdot 165}{7} = \frac{825}{7}
\]

Thus, the final answer is:

\[
\boxed{\frac{825}{7}}
\]

Logical Problem in Chinese

Input: 火药∶鞭炮∶二踢脚 可以对应下列选项中的：A.乌铁∶刀具∶三棱刀\nB.红砖∶建筑∶四合院\nC.清水∶白酒∶五粮液\nD.杉木∶乐器∶六弦琴

Output: 好的，让我们一起来解决这个逻辑关系题。我们需要找出与"火药∶鞭炮∶二踢脚"最相似的关系。

首先，让我们分析给定的关系：
火药 → 鞭炮 → 二踢脚

1. 火药是制作鞭炮的原料。
2. 鞭炮是由火药制成的。
3. 二踢脚是鞭炮的一种。

现在，让我们逐个检查选项：

A. 乌铁∶刀具∶三棱刀
   乌铁是制作刀具的原料，刀具是由乌铁制成的，三棱刀是刀具的一种。
   这个关系看起来很相似！

B. 红砖∶建筑∶四合院
   红砖是建筑的一种材料，但不是所有建筑都用红砖。
   四合院是建筑的一种，但不是所有建筑都叫四合院。
   这个关系不太匹配。

C. 清水∶白酒∶五粮液
   清水是制作白酒的原料，白酒是由清水制成的。
   但是五粮液是白酒的一种，而不是白酒的原料。
   这个关系不太匹配。

D. 杉木∶乐器∶六弦琴
   杉木是制作乐器的原料，乐器是由杉木制成的。
   但是六弦琴是乐器的一种，而不是乐器的原料。
   这个关系也不太匹配。

稍等，让我再仔细思考一下...

在重新审视这些选项后，我更加确信A选项是最佳答案。让我解释一下为什么：

1. 乌铁作为原料，刀具作为产品，三棱刀作为具体产品的关系，与火药、鞭炮、二踢脚的关系最为相似。
2. 每个选项都遵循了"原料 → 产品 → 具体产品"的模式。
3. 其他选项要么缺少了原料和产品的关系（如B和C），要么将产品和具体产品的关系颠倒了（如D）。

因此，最终答案是：A. 乌铁∶刀具∶三棱刀。

这个选项最准确地反映了原始关系中的逻辑结构。

Logical Problem in English

Input: How many `r`s  in strawberry?

Output: To determine how many `r`s are in the word "strawberry," we will follow a step-by-step approach:

1. **Identify the word**: The word we need to analyze is "strawberry."
2. **List each letter**: Write down each letter in the word to ensure we don't miss any.
   - s
   - t
   - r
   - a
   - w
   - b
   - e
   - r
   - r
   - y
3. **Count the occurrences of `r`**: Now, we count how many times the letter `r` appears in the list.
   - The first `r` is in the third position.
   - The second `r` is in the seventh position.
   - The third `r` is in the ninth position.
4. **Verify the count**: Double-check that we have counted all the `r`s correctly and haven't missed any or counted any other letters as `r`.

After following these steps, we find that there are three `r`s in the word "strawberry."

Therefore, the number of `r`s in strawberry is \(\boxed{3}\).

Code Problem

Input: Given an array of integers `nums` with a length of `n` and an integer `k`, please determine if there exist two adjacent subarrays of length `k` that are strictly increasing. Specifically, you need to check if there are two subarrays starting from indices `a` and `b` (where `a < b`) that meet the following conditions:
- Both subarrays `nums[a..a + k - 1]` and `nums[b..b + k - 1]` are strictly increasing.
- These two subarrays must be adjacent, meaning `b = a + k`.
If such two subarrays can be found, return `true`; otherwise, return `false`.
A subarray is a contiguous sequence of elements within an array.

Example 1:
Input: `nums = [2,5,7,8,9,2,3,4,3,1]`, `k = 3`
Output: `true`
Explanation:
The subarray starting from index 2 is `[7,8,9]`, which is strictly increasing. The subarray starting from index 5 is `[2,3,4]`, which is also strictly increasing. The two subarrays are adjacent, so the result is `true`.

Example 2:
Input: `nums = [1,2,3,4,4,4,4,5,6,7]`, `k = 5`
Output: `false`

Hints:
- `2 <= nums.length <= 1000`
- `1 <= 2 * k <= nums.length`
- `-1000 <= nums[i] <= 1000`

Please output the answer in the following format:

Output: ### Planning Phase

#### Key Steps:
1. **Iterate through the array**: Start from the first element and check every possible starting point for a subarray of length `k`.
2. **Check for strictly increasing subarray**: For each starting point, check if the subarray of length `k` is strictly increasing.
3. **Check the next subarray**: If the current subarray is strictly increasing, check if the next subarray of length `k` starting from the next element is also strictly increasing.
4. **Return the result**: If both subarrays are strictly increasing and adjacent, return `true`. If no such pair is found after checking all possible starting points, return `false`.

#### Important Considerations:
- **Edge Cases**: 
  - Arrays with fewer than `2*k` elements.
  - Arrays where no subarray of length `k` is strictly increasing.
  - Arrays where multiple subarrays of length `k` are strictly increasing, but none are adjacent.
- **Pitfalls**:
  - Ensure that the subarrays are strictly increasing, not just non-decreasing.
  - Handle cases where the array length is exactly `2*k`.

### Code Solution


class Solution(object):
    def hasincreasingSubarrays(self, nums, k):
        # Iterate through the array, stopping at the point where the next subarray would go out of bounds
        for i in range(len(nums) - k + 1):
            # Check if the current subarray is strictly increasing
            is_increasing = True
            for j in range(1, k):
                if nums[i + j] <= nums[i + j - 1]:
                    is_increasing = False
                    break
            
            # If the current subarray is strictly increasing, check the next subarray
            if is_increasing:
                # Check if the next subarray is strictly increasing
                for j in range(1, k):
                    if nums[i + k + j] <= nums[i + k + j - 1]:
                        break
                e

📄 License

The license for this project is other.

Property	Details
Base Model	meta-llama/Meta-Llama-3.1-8B-Instruct
Pipeline Tag	text-generation
License	other

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご