Thinkless - 1.5B - RL - DeepScaleRオープンソース大規模言語モデル - 適応的推論モードによる計算コストの削減

ホーム

Thinkless 1.5B RL DeepScaleR

Vinnnfによって開発

Thinklessは強化学習でトレーニングされた大規模言語モデルで、簡潔な回答または長鎖推論モードを適応的に選択でき、推論計算コストを大幅に削減します。

大規模言語モデル

Transformers

オープンソースライセンス:Apache-2.0 #適応推論 #計算効率最適化 #数学的推論

ダウンロード数 197

リリース時間 : 5/16/2025

モデル概要

Thinklessフレームワークは、解結合グループ相対戦略最適化アルゴリズム（DeGRPO）を使用して大規模言語モデルをトレーニングし、問題の複雑さに応じて簡潔な回答（<short>タグ）または長鎖推論（<think>タグ）モードを適応的に選択できるようにします。

モデル特徴

適応推論モード

モデルは問題の複雑さに応じて自動的に簡潔な回答または長鎖推論モードを選択し、長鎖推論の使用を50%-90%削減します。

解結合グループ相対戦略最適化

DeGRPOアルゴリズムを使用して混合推論目標を制御タグ損失と応答損失に分解し、元のGRPOの崩壊問題を回避します。

効率的な推論

正確性を維持しながら推論計算コストを大幅に削減し、迅速な応答が必要なアプリケーションシナリオに適しています。

モデル能力

数学的推論

問題解答

適応推論モード選択

使用事例

教育

数学問題解答

簡単な問題から複雑な問題まで解答し、問題の難易度に応じて自動的に簡潔な回答または詳細な推論モードを選択します。

Minerva Algebra、MATH-500、GSM8Kなどのベンチマークテストで良好なパフォーマンスを示しています

インテリジェントアシスタント

効率的な質問応答システム

問題の複雑さに応じて回答の長さを適応的に調整できるインテリジェントな質問応答システムを構築します。

不要な長鎖推論を減らし、応答速度を向上させます

🚀 Thinkless: LLMが思考するタイミングを学ぶ

Thinklessは、タスクの複雑さとモデルの能力に基づいて、LLMが短形式と長形式の推論を適応的に選択できるようにする学習可能なフレームワークです。これにより、推論言語モデルの計算コストを大幅に削減できます。

image/png

属性	详情
モデルタイプ	テキスト生成
ベースモデル	Vinnnf/Thinkless-1.5B-Warmup
トレーニングデータ	agentica-org/DeepScaleR-Preview-Dataset
ライブラリ名	transformers
論文リンク	ArXiv
GitHub	VainF/Thinkless
RLモデル	Thinkless-1.5B-RL-DeepScaleR
ウォームアップモデル	Thinkless-1.5B-Warmup
ウォームアップ用データ	Hybrid-OpenThoughts2-1M-1.5B
RL用データ	agentica-org/DeepScaleR-Preview-Dataset

✨ 主な機能

強化学習パラダイムでトレーニングされ、2つの制御トークン（<short>と<think>）を使用して、短形式と長形式の推論を選択できます。
Decoupled Group Relative Policy Optimization (DeGRPO)アルゴリズムを使用して、ハイブリッド推論の学習目標を2つのコンポーネントに分解し、トレーニングを安定させます。
いくつかのベンチマークで、長鎖思考の使用を50％ - 90％削減し、推論言語モデルの計算コストを大幅に削減できます。

🚀 クイックスタート

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
# prompt = "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?"
# prompt = "How many r's are in the word \"strawberry\""

messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# text = text + "<think>" # Uncomment this to force thinking mode

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

💻 使用例

基本的な使用法

# 上記のクイックスタートコードと同じ
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
# prompt = "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?"
# prompt = "How many r's are in the word \"strawberry\""

messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# text = text + "<think>" # Uncomment this to force thinking mode

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

高度な使用法

# 高度なシナリオでは、制御トークンを手動で指定することができます。
# 例えば、強制的に思考モードを有効にする場合
text = text + "<think>"
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

📚 ドキュメント

ケーススタディ

ユーザー:

1+1=?

Thinkless-1.5B:

<short>
To solve the problem \(1 + 1\), we will follow a step-by-step approach:

1. **Identify the numbers involved**: We have the numbers 1 and 1.
2. **Understand the operation**: The operation we need to perform is addition.
3. **Perform the addition**: When we add 1 and 1 together, we get 2.

So, the final answer is \(\boxed{2}\).

DeepSeek-R1:

Okay, so I need to figure out what 1 plus 1 equals. Let me start by thinking about basic addition. When you add two numbers together, you're combining their values. So if I have one apple and someone gives me another apple, how many apples do I have in total? That should be two apples.

Wait, let me make sure I'm not missing something here. In the simplest terms, addition is taking two quantities and putting them together. So 1 plus 1 means combining a single unit with another single unit. If I use my fingers, holding up one finger on my left hand and one on my right, then when I put them together, I have two fingers up. That seems right.

But maybe there's a trick to this question. Sometimes problems that look easy might have a different answer depending on the context. For example, in binary, 1 + 1 equals 10. But the question doesn't specify a number system, so I should assume it's in the decimal system, which is the default. So in decimal, 1 + 1 is definitely 2.

Another way to check is by using the number line. Starting at 1, if I move one unit to the right, I land on 2. That confirms it. Or using basic arithmetic rules: the sum of 1 and 1 is the next number in the sequence after 1, which is 2.

I guess there's no ambiguity here. Unless the question is trying to be philosophical or something, but mathematically, it's straightforward. So the answer should be 2.

The sum of 1 and 1 is calculated as follows:

1 + 1 = 2

In decimal arithmetic, adding the numbers 1 and 1 results in the number 2.

ユーザー:

What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?

Thinkless-1.5B:
Note: The long reasoning process is omitted.

<think>
Okay, let's try to figure out the smallest positive perfect cube that can be written as the sum of three consecutive integers. Hmm, let's start by understanding the problem step by step.

First, a perfect cube is a number raised to the power of 3. So, we need to find the smallest positive integer \( n \) such that \( n = a + (a+1) + (a+2) \), where \( a \) is an integer. Then, \( n \) should also be a perfect cube. Let me write that down:

\( n = a + (a+1) + (a+2) \)

...

Next, 27:

Is 27 a perfect cube? Yes, \( 3^3 = 27 \). Let's check:

...
**Final Answer**
The smallest positive perfect cube that can be written as the sum of three consecutive integers is \boxed{27}.
</think>

To find the smallest positive perfect cube that can be written as the sum of three consecutive integers, we start by expressing the sum of three consecutive integers. Let the three consecutive integers be \( a \), \( a+1 \), and \( a+2 \). Their sum is:

\[
a + (a+1) + (a+2) = 3a + 3
\]

This sum must be a perfect cube. Let \( n \) be the perfect cube, so we have:

\[
n = 3(a + 1)
\]

Thus, \( n \) must be a multiple of 3. The smallest perfect cubes are 1, 8, 27, 64, 125, etc. We check each of these to see if they can be written as \( 3(a + 1) \):

- For \( n = 1 \): \( 3(a + 1) = 1 \) gives \( a = -\frac{2}{3} \), not an integer.
- For \( n = 8 \): \( 3(a + 1) = 8 \) gives \( a = \frac{5}{3} \), not an integer.
- For \( n = 27 \): \( 3(a + 1) = 27 \) gives \( a = 8 \), which is an integer. The three consecutive integers are 8, 9, and 10, and their sum is \( 8 + 9 + 10 = 27 \).

Checking the next perfect cubes (64, 125, etc.) confirms they do not yield integer values for \( a \). Therefore, the smallest positive perfect cube that can be written as the sum of three consecutive integers is:

\[
\boxed{27}
\]

📄 ライセンス

このプロジェクトは、Apache-2.0ライセンスの下で公開されています。

📚 引用

もしこの研究が役に立った場合は、以下の文献を引用してください。

@article{fang2025thinkless,
  title={Thinkless: LLM Learns When to Think},
  author={Fang, Gongfan and Ma, Xinyin and Wang, Xinchao},
  journal={arXiv preprint arXiv:2505.13379},
  year={2025}
}