Skywork o1 Open-Llama-3.1-8B開源對話模型 - 增強推理能力，提供優質問答服務

首頁

Skywork O1 Open Llama 3.1 8B

由Skywork開發

Skywork o1 Open-Llama-3.1-8B是基於Llama-3.1-8B訓練的強力對話模型，通過'慢思考'推理風格數據顯著增強推理能力。

大型語言模型

Safetensors

開源協議:其他 #慢思考推理 #數學編程增強 #漸進式獎勵訓練

下載量 583

發布時間 : 11/26/2024

模型概述

具備類o1慢思考推理能力的先進模型，專長解決數學、編程和邏輯推理等複雜問題。

模型特點

慢思考推理能力

通過專有訓練方法實現深度思考、規劃和反思能力

三階段訓練方案

包含反思式推理訓練、推理能力強化學習和推理規劃

Q*在線推理算法

首次實現並公開發布的先進推理算法，優化推理路徑

模型能力

複雜問題求解

數學推理

邏輯推理

編程問題解答

倫理決策分析

自我驗證

使用案例

教育

數學問題解答

解決各類數學應用題和方程式

在GSM8K等數學測試集上表現優異

編程

算法問題解決

分析並解決編程算法問題

在HumanEval等編程測試中表現突出

邏輯分析

邏輯推理題

解答各類邏輯關係和類比推理問題

展示出優秀的邏輯分析能力

🚀 Skywork o1 Open模型系列

Skywork o1 Open模型系列由崑崙萬維的Skywork團隊開發，該系列模型融入了類似o1的慢思考與推理能力，能有效解決複雜的推理問題，在數學、邏輯等多領域展現出卓越性能。

🚀 快速開始

若要使用Skywork-o1-Open-Llama3.1-8B進行推理，可參考以下代碼：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

system_prompt = """You are Skywork-o1, a thinking model developed by Skywork AI, specializing in solving complex problems involving mathematics, coding, and logical reasoning through deep thought. When faced with a user's request, you first engage in a lengthy and in-depth thinking process to explore possible solutions to the problem. After completing your thoughts, you then provide a detailed explanation of the solution process in your response."""

# An Example Case
problem = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"

user_message = problem

conversation = [
    {
        "role": "system",
        "content": system_prompt
    },
    {
        "role": "user", 
        "content": user_message
    }
]

model_name = "Skywork-o1-Open-Llama3.1-8B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer.apply_chat_template(
    conversation, 
    tokenize=True, 
    add_generation_prompt=True,
    return_tensors="pt").to(model.device)

generation = model.generate(
    input_ids=input_ids,
    max_new_tokens=2048,
    do_sample=False,
    pad_token_id=128009,
    temperature=0)

completion = tokenizer.decode(
    generation[0][len(input_ids[0]):], 
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True)

print(completion)

✨ 主要特性

增強的模型思考與規劃能力：能夠對複雜問題進行深入思考和規劃，找到最優解決方案。
先進的自我反思與自我驗證能力：可對推理過程和結果進行自我反思與驗證，確保結果的準確性。
出色的推理能力：相較於以往的大模型，Skywork o1 Open系列能夠出色應對各種推理挑戰，包括常識、邏輯、數學、倫理決策和邏輯陷阱等問題。

📚 詳細文檔

模型介紹

Skywork o1 Open模型系列包含三款先進模型：

Skywork o1 Open-Llama-3.1-8B：基於Llama-3.1-8B訓練的強大聊天模型，通過“o1風格”數據顯著提升了推理能力。
Skywork o1 Open-PRM-Qwen-2.5-1.5B：專門設計的模型，通過增量過程獎勵提升推理能力，適合小規模複雜問題的解決。
Skywork o1 Open-PRM-Qwen-2.5-7B：在1.5B模型的基礎上進行擴展，能夠處理更具挑戰性的推理任務。

訓練方法

Skywork o1 Open系列的卓越認知能力通過三階段訓練方案實現：

反思推理訓練：利用專有多智能體系統為長思考任務生成高質量、多樣化的數據，隨後進行持續預訓練和監督微調。
推理能力強化學習：引入Skywork o1過程獎勵模型（PRM），專門用於增強逐步推理能力。實驗證實，Skywork-PRM結合專有推理強化算法，能有效捕捉中間推理步驟對最終結果的影響。
推理規劃：部署天宮專有的Q在線推理算法，結合基於模型的思考方式，搜索最優推理路徑。這是Q算法的首次實現和公開發布，顯著提升了模型的在線推理能力。

模型亮點

Skywork o1 Open 8B：該模型在各種數學和編碼基準測試中表現出顯著改進，將Llama-3.1-8B的性能提升至同類領先水平，超越了先前的SOTA模型（類似規模）Qwen-2.5-7B instruct。
Skywork o1 Open PRM：推出兩款用於推理任務的先進過程獎勵模型（PRM）：
- Skywork o1 Open-PRM-Qwen2.5-1.5B：達到8B模型的性能，可與RLHFlow的Llama3.1-8B-PRM-Deepseek-Data和OpenR的Math-psa-7B等先進模型競爭。
- Skywork o1 Open-PRM-Qwen2.5-7B：在大多數基準測試中與Qwen2.5-Math-RM-72B等更大規模模型相匹配或超越，為AI推理設定了新標準。

推理代碼可在此處獲取。

💻 使用示例

基礎用法

以下是Skywork-o1-Open-Llama3.1-8B在不同類型問題上的推理示例：

中文數學問題

Input: 一項工程，山東隊獨做需15天完成，江蘇隊獨做需10天完成。兩隊合作，幾天可以完成？

Output: 好的,讓我們來解決這個有趣的工程問題。我會用輕鬆的語氣,一步步地分析這個問題。

首先,讓我們理解一下題目給出的信息:
1. 山東隊獨做需要15天完成工程
2. 江蘇隊獨做需要10天完成工程

現在,我們需要找出兩隊合作時的完成時間。讓我們從每隊的工作效率開始分析:

1. 山東隊的效率:
   15天完成1個工程,所以每天完成的工程量是 1/15

2. 江蘇隊的效率:
   10天完成1個工程,所以每天完成的工程量是 1/10

稍等,讓我們再檢查一下這個計算...沒錯,這個計算是正確的。

現在,兩隊合作時,每天完成的工程量是兩隊效率的和:
1/15 + 1/10 = 2/30 + 3/30 = 5/30 = 1/6

這意味著每天兩隊合作完成的工程量是1/6。

接下來,我們需要計算完成1個工程需要多少天。我們可以用以下等式:
(1/6) × x = 1

解這個等式:
x = 1 ÷ (1/6) = 6

所以,兩隊合作完成工程需要6天。

讓我們再驗證一下:
- 6天 × (1/6) = 1
- 這確實等於1,說明我們的計算是正確的。

因此,最終答案是:兩隊合作完成工程需要6天。

英文數學問題

Input: Determine the value of $y$ in its simplest form, where $y = \frac{p + q\sqrt{r}}{s}$, given that $\frac{7y}{5} - 2 = \frac{4}{y}$, and $p, q, r, s$ are integers. What is $\frac{pqr}{s}$?

Output: To determine the value of \( y \) in the form \( y = \frac{p + q\sqrt{r}}{s} \), we start with the given equation:

\[
\frac{7y}{5} - 2 = \frac{4}{y}
\]

First, we eliminate the denominators by multiplying both sides of the equation by \( 5y \):

\[
7y^2 - 10y = 20
\]

Next, we rearrange the equation to set it to zero:

\[
7y^2 - 10y - 20 = 0
\]

We solve this quadratic equation using the quadratic formula \( y = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \), where \( a = 7 \), \( b = -10 \), and \( c = -20 \). Plugging in these values, we get:

\[
y = \frac{-(-10) \pm \sqrt{(-10)^2 - 4 \cdot 7 \cdot (-20)}}{2 \cdot 7} = \frac{10 \pm \sqrt{100 + 560}}{14} = \frac{10 \pm \sqrt{660}}{14}
\]

We simplify \( \sqrt{660} \) as follows:

\[
\sqrt{660} = \sqrt{4 \cdot 165} = 2\sqrt{165}
\]

Substituting this back into the quadratic formula, we have:

\[
y = \frac{10 \pm 2\sqrt{165}}{14} = \frac{5 \pm \sqrt{165}}{7}
\]

Since we are asked for the value of \( y \) in the form \( \frac{p + q\sqrt{r}}{s} \), we can choose either the positive or negative root. Let's take the positive root for simplicity:

\[
y = \frac{5 + \sqrt{165}}{7}
\]

Here, \( p = 5 \), \( q = 1 \), \( r = 165 \), and \( s = 7 \). We need to find the value of \( \frac{pqr}{s} \):

\[
\frac{pqr}{s} = \frac{5 \cdot 1 \cdot 165}{7} = \frac{825}{7}
\]

Thus, the final answer is:

\[
\boxed{\frac{825}{7}}
\]

中文邏輯問題

Input: 火藥∶鞭炮∶二踢腳 可以對應下列選項中的：A.烏鐵∶刀具∶三稜刀\nB.紅磚∶建築∶四合院\nC.清水∶白酒∶五糧液\nD.杉木∶樂器∶六絃琴

Output: 好的，讓我們一起來解決這個邏輯關係題。我們需要找出與"火藥∶鞭炮∶二踢腳"最相似的關係。

首先，讓我們分析給定的關係：
火藥 → 鞭炮 → 二踢腳

1. 火藥是製作鞭炮的原料。
2. 鞭炮是由火藥製成的。
3. 二踢腳是鞭炮的一種。

現在，讓我們逐個檢查選項：

A. 烏鐵∶刀具∶三稜刀
   烏鐵是製作刀具的原料，刀具是由烏鐵製成的，三稜刀是刀具的一種。
   這個關係看起來很相似！

B. 紅磚∶建築∶四合院
   紅磚是建築的一種材料，但不是所有建築都用紅磚。
   四合院是建築的一種，但不是所有建築都叫四合院。
   這個關係不太匹配。

C. 清水∶白酒∶五糧液
   清水是製作白酒的原料，白酒是由清水製成的。
   但是五糧液是白酒的一種，而不是白酒的原料。
   這個關係不太匹配。

D. 杉木∶樂器∶六絃琴
   杉木是製作樂器的原料，樂器是由杉木製成的。
   但是六絃琴是樂器的一種，而不是樂器的原料。
   這個關係也不太匹配。

稍等，讓我再仔細思考一下...

在重新審視這些選項後，我更加確信A選項是最佳答案。讓我解釋一下為什麼：

1. 烏鐵作為原料，刀具作為產品，三稜刀作為具體產品的關係，與火藥、鞭炮、二踢腳的關係最為相似。
2. 每個選項都遵循了"原料 → 產品 → 具體產品"的模式。
3. 其他選項要麼缺少了原料和產品的關係（如B和C），要麼將產品和具體產品的關係顛倒了（如D）。

因此，最終答案是：A. 烏鐵∶刀具∶三稜刀。

這個選項最準確地反映了原始關係中的邏輯結構。

英文邏輯問題

Input: How many `r`s  in strawberry?

Ouput: To determine how many `r`s are in the word "strawberry," we will follow a step-by-step approach:

1. **Identify the word**: The word we need to analyze is "strawberry."
2. **List each letter**: Write down each letter in the word to ensure we don't miss any.
   - s
   - t
   - r
   - a
   - w
   - b
   - e
   - r
   - r
   - y
3. **Count the occurrences of `r`**: Now, we count how many times the letter `r` appears in the list.
   - The first `r` is in the third position.
   - The second `r` is in the seventh position.
   - The third `r` is in the ninth position.
4. **Verify the count**: Double-check that we have counted all the `r`s correctly and haven't missed any or counted any other letters as `r`.

After following these steps, we find that there are three `r`s in the word "strawberry."

Therefore, the number of `r`s in strawberry is \(\boxed{3}\).

代碼問題

Input: Given an array of integers `nums` with a length of `n` and an integer `k`, please determine if there exist two adjacent subarrays of length `k` that are strictly increasing. Specifically, you need to check if there are two subarrays starting from indices `a` and `b` (where `a < b`) that meet the following conditions:
- Both subarrays `nums[a..a + k - 1]` and `nums[b..b + k - 1]` are strictly increasing.
- These two subarrays must be adjacent, meaning `b = a + k`.
If such two subarrays can be found, return `true`; otherwise, return `false`.
A subarray is a contiguous sequence of elements within an array.

Example 1:
Input: `nums = [2,5,7,8,9,2,3,4,3,1]`, `k = 3`
Output: `true`
Explanation:
The subarray starting from index 2 is `[7,8,9]`, which is strictly increasing. The subarray starting from index 5 is `[2,3,4]`, which is also strictly increasing. The two subarrays are adjacent, so the result is `true`.

Example 2:
Input: `nums = [1,2,3,4,4,4,4,5,6,7]`, `k = 5`
Output: `false`

Hints:
- `2 <= nums.length <= 1000`
- `1 <= 2 * k <= nums.length`
- `-1000 <= nums[i] <= 1000`

Please output the answer in the following format:

Output: ### Planning Phase

#### Key Steps:
1. **Iterate through the array**: Start from the first element and check every possible starting point for a subarray of length `k`.
2. **Check for strictly increasing subarray**: For each starting point, check if the subarray of length `k` is strictly increasing.
3. **Check the next subarray**: If the current subarray is strictly increasing, check if the next subarray of length `k` starting from the next element is also strictly increasing.
4. **Return the result**: If both subarrays are strictly increasing and adjacent, return `true`. If no such pair is found after checking all possible starting points, return `false`.

#### Important Considerations:
- **Edge Cases**: 
  - Arrays with fewer than `2*k` elements.
  - Arrays where no subarray of length `k` is strictly increasing.
  - Arrays where multiple subarrays of length `k` are strictly increasing, but none are adjacent.
- **Pitfalls**:
  - Ensure that the subarrays are strictly increasing, not just non-decreasing.
  - Handle cases where the array length is exactly `2*k`.

### Code Solution


class Solution(object):
    def hasincreasingSubarrays(self, nums, k):
        # Iterate through the array, stopping at the point where the next subarray would go out of bounds
        for i in range(len(nums) - k + 1):
            # Check if the current subarray is strictly increasing
            is_increasing = True
            for j in range(1, k):
                if nums[i + j] <= nums[i + j - 1]:
                    is_increasing = False
                    break
            
            # If the current subarray is strictly increasing, check the next subarray
            if is_increasing:
                # Check if the next subarray is strictly increasing
                for j in range(1, k):
                    if nums[i + k + j] <= nums[i + k + j - 1]:
                        break
                else:
                    # If the next subarray is also strictly increasing, return True
                    return True
        
        # If no such pair of subarrays is found, return False
        return False

### Solution Explanation

#### Detailed Explanation:
1. **Iterate through the array**:
   - We use a loop to iterate through the array from the first element to the point where the next subarray of length `k` would go out of bounds (`len(nums) - k + 1`).

2. **Check for strictly increasing subarray**:
   - For each starting index `i`, we check if the subarray `nums[i:i+k]` is strictly increasing.
   - We use a nested loop to compare each element with the previous one. If any element is not greater than the previous one, we set `is_increasing` to `False` and break out of the loop.

3. **Check the next subarray**:
   - If the current subarray is strictly increasing, we then check the next subarray starting from `i + k`.
   - We use another nested loop to check if `nums[i+k:i+2*k]` is strictly increasing.
   - If the next subarray is also strictly increasing, we return `True`.

4. **Return the result**:
   - If we complete the loop without finding any such pair of subarrays, we return `False`.

#### Clarity and Breakdown:
- **Outer Loop**: The outer loop iterates through each possible starting point for a subarray of length `k`.
- **Inner Loop for Current Subarray**: The inner loop checks if the current subarray is strictly increasing.
- **Inner Loop for Next Subarray**: If the current subarray is strictly increasing, the inner loop checks if the next subarray is also strictly increasing.
- **Return True/False**: If both subarrays are strictly increasing and adjacent, we return `True`. If no such pair is found, we return `False`.

This approach ensures that we efficiently check all possible pairs of adjacent strictly increasing subarrays of length `k` in the given array.

🔧 技術細節

Skywork o1 Open系列的卓越認知能力通過三階段訓練方案實現：

反思推理訓練：利用專有多智能體系統為長思考任務生成高質量、多樣化的數據，隨後進行持續預訓練和監督微調。
推理能力強化學習：引入Skywork o1過程獎勵模型（PRM），專門用於增強逐步推理能力。實驗證實，Skywork-PRM結合專有推理強化算法，能有效捕捉中間推理步驟對最終結果的影響。
推理規劃：部署天宮專有的Q在線推理算法，結合基於模型的思考方式，搜索最優推理路徑。這是Q算法的首次實現和公開發布，顯著提升了模型的在線推理能力。

📄 許可證

Skywork模型的社區使用需要遵循Skywork社區許可證。Skywork模型支持商業使用。如果您計劃將Skywork模型或其衍生產品用於商業目的，則必須遵守Skywork社區許可證中的條款和條件。

📞 聯繫我們

如果您有任何問題，請隨時通過{jujie.he, liang.zhao, liang.zeng, tianwen.wei}@kunlun-inc.com與我們聯繫。

⚠️ 免責聲明

我們在此聲明，不得將Skywork模型用於任何危害國家或社會安全的活動或從事非法行為。此外，我們要求用戶在未進行適當的安全審查和記錄的情況下，不要將Skywork模型部署到互聯網服務中。我們希望所有用戶都能遵守這一原則，以確保技術進步在規範和合法的環境中進行。

我們已盡最大努力確保模型訓練過程中使用的數據符合規定。然而，儘管我們付出了巨大努力，但由於模型和數據的複雜性，仍可能存在不可預測的風險和問題。因此，如果因使用Skywork開源模型而出現任何問題，包括但不限於數據安全問題、輿論風險，或因模型被誤導、濫用、傳播或不當使用而產生的任何風險和問題，我們將不承擔任何責任。

📖 引用

如果您認為我們的工作有幫助，請使用以下BibTeX條目引用我們：

@misc{skyworkopeno12024,
  title={Skywork-o1 Open Series},
  author={He, Jujie and Wei, Tianwen and Yan, Rui and Liu, Jiacai and Wang, Chaojie and Gan, Yimeng and Tu, Shiwen and Liu, Chris Yuhao and Zeng, Liang and Wang, Xiaokun and Wang, Boyang and Li, Yongcong and Zhang, Fuxiang and Xu, Jiacheng and An, Bo and Liu, Yang and Zhou, Yahui},
  year={2024},
  month={November},
  howpublished={\url{https://huggingface.co/Skywork}},
  url={https://huggingface.co/Skywork},
}