Thinkless-1.5B-RL-DeepScaleR開源大語言模型 - 自適應推理模式降低計算成本

首頁

Thinkless 1.5B RL DeepScaleR

由Vinnnf開發

Thinkless是一個通過強化學習訓練的大語言模型，能夠自適應選擇簡答或長鏈推理模式，顯著降低推理計算成本。

大型語言模型

Transformers

開源協議:Apache-2.0 #自適應推理 #計算效率優化 #數學推理

下載量 197

發布時間 : 5/16/2025

模型概述

Thinkless框架通過解耦分組相對策略優化算法（DeGRPO）訓練大語言模型，使其能夠根據問題複雜度自適應選擇簡答（<short>標記）或長鏈推理（<think>標記）模式。

模型特點

自適應推理模式

模型能根據問題複雜度自動選擇簡答或長鏈推理模式，減少50%-90%的長鏈推理使用。

解耦分組相對策略優化

採用DeGRPO算法將混合推理目標分解為控制標記損失和響應損失，避免原始GRPO的坍縮問題。

高效推理

在保持準確性的同時顯著降低推理計算成本，適用於需要快速響應的應用場景。

模型能力

數學推理

問題解答

自適應推理模式選擇

使用案例

教育

數學問題解答

解答從簡單到複雜的數學問題，根據問題難度自動選擇簡答或詳細推理模式。

在Minerva Algebra、MATH-500和GSM8K等基準測試中表現良好

智能助手

高效問答系統

構建能夠根據問題複雜度自適應調整回答長度的智能問答系統。

減少不必要的長鏈推理，提高響應速度

🚀 Thinkless：大語言模型學會何時思考

Thinkless是一個可學習的框架，它使大語言模型能夠根據任務複雜度和模型自身能力，自適應地在短形式和長形式推理之間進行選擇。該框架基於強化學習範式進行訓練，能顯著降低推理語言模型的計算成本。

🚀 快速開始

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
# prompt = "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?"
# prompt = "How many r's are in the word \"strawberry\""

messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# text = text + "<think>" # Uncomment this to force thinking mode

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

✨ 主要特性

我們提出了Thinkless框架，它能讓大語言模型根據任務複雜度和自身能力，自適應地選擇短形式或長形式推理。該框架基於強化學習範式進行訓練，使用兩個控制令牌：<short>用於簡潔響應，<think>用於詳細推理。核心是解耦組相對策略優化（DeGRPO）算法，它將混合推理的學習目標分解為控制令牌損失和響應損失，實現了對每個目標貢獻的細粒度控制，穩定了訓練過程，有效防止了普通GRPO中出現的崩潰問題。在多個基準測試中，Thinkless能夠將長鏈思考的使用減少50% - 90%，顯著降低了推理語言模型的計算成本。

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
# prompt = "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?"
# prompt = "How many r's are in the word \"strawberry\""

messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# text = text + "<think>" # Uncomment this to force thinking mode

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

案例分析

簡單問題對比

用戶提問：

1+1=?

Thinkless - 1.5B回答：

<short>
To solve the problem \(1 + 1\), we will follow a step-by-step approach:

1. **Identify the numbers involved**: We have the numbers 1 and 1.
2. **Understand the operation**: The operation we need to perform is addition.
3. **Perform the addition**: When we add 1 and 1 together, we get 2.

So, the final answer is \(\boxed{2}\).

DeepSeek - R1回答：

Okay, so I need to figure out what 1 plus 1 equals. Let me start by thinking about basic addition. When you add two numbers together, you're combining their values. So if I have one apple and someone gives me another apple, how many apples do I have in total? That should be two apples.

Wait, let me make sure I'm not missing something here. In the simplest terms, addition is taking two quantities and putting them together. So 1 plus 1 means combining a single unit with another single unit. If I use my fingers, holding up one finger on my left hand and one on my right, then when I put them together, I have two fingers up. That seems right.

But maybe there's a trick to this question. Sometimes problems that look easy might have a different answer depending on the context. For example, in binary, 1 + 1 equals 10. But the question doesn't specify a number system, so I should assume it's in the decimal system, which is the default. So in decimal, 1 + 1 is definitely 2.

Another way to check is by using the number line. Starting at 1, if I move one unit to the right, I land on 2. That confirms it. Or using basic arithmetic rules: the sum of 1 and 1 is the next number in the sequence after 1, which is 2.

I guess there's no ambiguity here. Unless the question is trying to be philosophical or something, but mathematically, it's straightforward. So the answer should be 2.

The sum of 1 and 1 is calculated as follows:

1 + 1 = 2

In decimal arithmetic, adding the numbers 1 and 1 results in the number 2.

複雜問題示例

用戶提問：

What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?

Thinkless - 1.5B回答：
Note: The long reasoning process is omitted.

<think>
Okay, let's try to figure out the smallest positive perfect cube that can be written as the sum of three consecutive integers. Hmm, let's start by understanding the problem step by step.

First, a perfect cube is a number raised to the power of 3. So, we need to find the smallest positive integer \( n \) such that \( n = a + (a+1) + (a+2) \), where \( a \) is an integer. Then, \( n \) should also be a perfect cube. Let me write that down:

\( n = a + (a+1) + (a+2) \)

...

Next, 27:

Is 27 a perfect cube? Yes, \( 3^3 = 27 \). Let's check:

...
**Final Answer**
The smallest positive perfect cube that can be written as the sum of three consecutive integers is \boxed{27}.
</think>

To find the smallest positive perfect cube that can be written as the sum of three consecutive integers, we start by expressing the sum of three consecutive integers. Let the three consecutive integers be \( a \), \( a+1 \), and \( a+2 \). Their sum is:

\[
a + (a+1) + (a+2) = 3a + 3
\]

This sum must be a perfect cube. Let \( n \) be the perfect cube, so we have:

\[
n = 3(a + 1)
\]

Thus, \( n \) must be a multiple of 3. The smallest perfect cubes are 1, 8, 27, 64, 125, etc. We check each of these to see if they can be written as \( 3(a + 1) \):

- For \( n = 1 \): \( 3(a + 1) = 1 \) gives \( a = -\frac{2}{3} \), not an integer.
- For \( n = 8 \): \( 3(a + 1) = 8 \) gives \( a = \frac{5}{3} \), not an integer.
- For \( n = 27 \): \( 3(a + 1) = 27 \) gives \( a = 8 \), which is an integer. The three consecutive integers are 8, 9, and 10, and their sum is \( 8 + 9 + 10 = 27 \).

Checking the next perfect cubes (64, 125, etc.) confirms they do not yield integer values for \( a \). Therefore, the smallest positive perfect cube that can be written as the sum of three consecutive integers is:

\[
\boxed{27}
\]

📚 詳細文檔

模型信息

屬性	詳情
模型類型	文本生成
訓練數據	agentica-org/DeepScaleR - Preview - Dataset
基礎模型	Vinnnf/Thinkless - 1.5B - Warmup
庫名稱	transformers

內容	鏈接
論文鏈接	ArXiv
GitHub倉庫	VainF/Thinkless
RL模型	Thinkless - 1.5B - RL - DeepScaleR
預熱模型	Thinkless - 1.5B - Warmup
預熱數據	Hybrid - OpenThoughts2 - 1M - 1.5B
RL數據	agentica-org/DeepScaleR - Preview - Dataset

模型流程

📄 許可證

本項目採用Apache - 2.0許可證。

📚 引用

如果您覺得本工作有幫助，請引用：

@article{fang2025thinkless,
  title={Thinkless: LLM Learns When to Think},
  author={Fang, Gongfan and Ma, Xinyin and Wang, Xinchao},
  journal={arXiv preprint arXiv:2505.13379},
  year={2025}
}