Thinkless-1.5B-RL-DeepScaleR Open-Source Large Language Model - Adaptive Inference Mode Reduces Computational Cost

Thinkless 1.5B RL DeepScaleR

Developed by Vinnnf

Thinkless is a large language model trained via reinforcement learning, capable of adaptively selecting short or long-chain reasoning modes, significantly reducing inference computational costs.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Adaptive Reasoning #Computational Efficiency Optimization #Mathematical Reasoning

Downloads 197

Release Time : 5/16/2025

Model Overview

The Thinkless framework trains large language models using the Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, enabling them to adaptively choose between short (<short> tag) or long-chain reasoning (<think> tag) modes based on problem complexity.

Model Features

Adaptive Reasoning Mode

The model can automatically choose between short or long-chain reasoning modes based on problem complexity, reducing long-chain reasoning usage by 50%-90%.

Decoupled Group Relative Policy Optimization

Employs the DeGRPO algorithm to decompose mixed reasoning objectives into control tag loss and response loss, avoiding the collapse issues of the original GRPO.

Efficient Inference

Significantly reduces inference computational costs while maintaining accuracy, suitable for applications requiring rapid responses.

Model Capabilities

Mathematical Reasoning

Question Answering

Adaptive Reasoning Mode Selection

Use Cases

Education

Mathematical Problem Solving

Answers mathematical problems ranging from simple to complex, automatically selecting short or detailed reasoning modes based on difficulty.

Performs well on benchmarks such as Minerva Algebra, MATH-500, and GSM8K

Intelligent Assistant

Efficient Q&A System

Builds an intelligent Q&A system that adaptively adjusts response length based on problem complexity.

Reduces unnecessary long-chain reasoning, improving response speed

🚀 Thinkless: LLM Learns When to Think

Thinkless is an innovative framework enabling large language models (LLMs) to adaptively choose between short - form and long - form reasoning. It reduces the computational cost of reasoning language models by minimizing long - chain thinking.

Property	Details
License	Apache - 2.0
Datasets	agentica - org/DeepScaleR - Preview - Dataset
Base Model	Vinnnf/Thinkless - 1.5B - Warmup
Pipeline Tag	text - generation
Library Name	transformers

image/png

Property	Details
Paper Link	ArXiv
GitHub	VainF/Thinkless
RL Model	Thinkless - 1.5B - RL - DeepScaleR
Warmup Model	Thinkless - 1.5B - Warmup
Data for Warmup	Hybrid - OpenThoughts2 - 1M - 1.5B
Data for RL	agentica - org/DeepScaleR - Preview - Dataset

✨ Features

We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short - form and long - form reasoning, based on both task complexity and the model's ability. Thinkless is trained under a reinforcement learning paradigm and employs two control tokens, <short> for concise responses and <think> for detailed reasoning. At the core of our method is a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective of hybrid reasoning into two components: (1) a control token loss that governs the selection of the reasoning mode, and (2) a response loss that improves the accuracy of the generated answers. This decoupled formulation enables fine - grained control over the contributions of each objective, stabilizing training and effectively preventing collapse observed in vanilla GRPO. Empirically, on several benchmarks such as Minerva Algebra, MATH - 500, and GSM8K, Thinkless is able to reduce the usage of long - chain thinking by 50% - 90%, significantly reducing the computational cost of Reasoning Language Models.

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
# prompt = "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?"
# prompt = "How many r's are in the word \"strawberry\""

messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# text = text + "<think>" # Uncomment this to force thinking mode

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

💻 Usage Examples

Basic Usage

# The code from QuickStart can be used as a basic example
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-RL-DeepScaleR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

Advanced Usage

# You can change the prompt and instruction according to different scenarios
# For example, change the prompt to a more complex math problem
instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?"
messages = [
    {"role": "user", "content": f"{instruction}\n{prompt}"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

think_mode = ("<think>" in response)

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

📚 Documentation

Case Study

User:

1+1=?

Thinkless - 1.5B:

<short>
To solve the problem \(1 + 1\), we will follow a step - by - step approach:

1. **Identify the numbers involved**: We have the numbers 1 and 1.
2. **Understand the operation**: The operation we need to perform is addition.
3. **Perform the addition**: When we add 1 and 1 together, we get 2.

So, the final answer is \(\boxed{2}\).

DeepSeek - R1:

Okay, so I need to figure out what 1 plus 1 equals. Let me start by thinking about basic addition. When you add two numbers together, you're combining their values. So if I have one apple and someone gives me another apple, how many apples do I have in total? That should be two apples.

Wait, let me make sure I'm not missing something here. In the simplest terms, addition is taking two quantities and putting them together. So 1 plus 1 means combining a single unit with another single unit. If I use my fingers, holding up one finger on my left hand and one on my right, then when I put them together, I have two fingers up. That seems right.

But maybe there's a trick to this question. Sometimes problems that look easy might have a different answer depending on the context. For example, in binary, 1 + 1 equals 10. But the question doesn't specify a number system, so I should assume it's in the decimal system, which is the default. So in decimal, 1 + 1 is definitely 2.

Another way to check is by using the number line. Starting at 1, if I move one unit to the right, I land on 2. That confirms it. Or using basic arithmetic rules: the sum of 1 and 1 is the next number in the sequence after 1, which is 2.

I guess there's no ambiguity here. Unless the question is trying to be philosophical or something, but mathematically, it's straightforward. So the answer should be 2.

The sum of 1 and 1 is calculated as follows:

1 + 1 = 2

In decimal arithmetic, adding the numbers 1 and 1 results in the number 2.

User:

What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?

Thinkless - 1.5B:
Note: The long reasoning process is omitted.

<think>
Okay, let's try to figure out the smallest positive perfect cube that can be written as the sum of three consecutive integers. Hmm, let's start by understanding the problem step by step.

First, a perfect cube is a number raised to the power of 3. So, we need to find the smallest positive integer \( n \) such that \( n = a+(a + 1)+(a + 2) \), where \( a \) is an integer. Then, \( n \) should also be a perfect cube. Let me write that down:

\( n = a+(a + 1)+(a + 2) \)

...

Next, 27:

Is 27 a perfect cube? Yes, \( 3^3 = 27 \). Let's check:

...
**Final Answer**
The smallest positive perfect cube that can be written as the sum of three consecutive integers is \boxed{27}.
</think>

To find the smallest positive perfect cube that can be written as the sum of three consecutive integers, we start by expressing the sum of three consecutive integers. Let the three consecutive integers be \( a \), \( a + 1 \), and \( a + 2 \). Their sum is:

\[
a+(a + 1)+(a + 2)=3a + 3
\]

This sum must be a perfect cube. Let \( n \) be the perfect cube, so we have:

\[
n = 3(a + 1)
\]

Thus, \( n \) must be a multiple of 3. The smallest perfect cubes are 1, 8, 27, 64, 125, etc. We check each of these to see if they can be written as \( 3(a + 1) \):

- For \( n = 1 \): \( 3(a + 1)=1 \) gives \( a =-\frac{2}{3} \), not an integer.
- For \( n = 8 \): \( 3(a + 1)=8 \) gives \( a=\frac{5}{3} \), not an integer.
- For \( n = 27 \): \( 3(a + 1)=27 \) gives \( a = 8 \), which is an integer. The three consecutive integers are 8, 9, and 10, and their sum is \( 8 + 9+10 = 27 \).

Checking the next perfect cubes (64, 125, etc.) confirms they do not yield integer values for \( a \). Therefore, the smallest positive perfect cube that can be written as the sum of three consecutive integers is:

\[
\boxed{27}
\]

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Citation

If you find this work helpful, please cite:

@article{fang2025thinkless,
  title={Thinkless: LLM Learns When to Think},
  author={Fang, Gongfan and Ma, Xinyin and Wang, Xinchao},
  journal={arXiv preprint arXiv:2505.13379},
  year={2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご