Thinkless-1.5B-Warmup Open Source Model - The Thoughtless Framework Enables Large Models to Adaptively Perform Long and Short Inferences

Thinkless 1.5B Warmup

Developed by Vinnnf

The Thinkless framework is a learnable framework that enables large models to adaptively choose between short reasoning or long-chain reasoning based on task complexity and their own capabilities.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Adaptive Reasoning #Reinforcement Learning Optimization #Mathematical Reasoning

Downloads 966

Release Time : 5/16/2025

Model Overview

This framework is trained using a reinforcement learning paradigm, employing two control tokens: <short> triggers concise responses, while <think> triggers detailed reasoning. The core method is the Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective of hybrid reasoning into control token loss and response loss.

Model Features

Adaptive Reasoning

Automatically selects between short reasoning or long-chain reasoning modes based on task complexity

Decoupled Group Relative Policy Optimization

Uses the DeGRPO algorithm to decompose the learning objective into control token loss and response loss

Efficient Reasoning

Reduces the use of long-chain reasoning by 50%-90% in benchmark tests, significantly lowering computational costs

Model Capabilities

Adaptive Text Generation

Mathematical Reasoning

Question Answering

Use Cases

Education

Mathematical Problem Solving

Solves mathematical problems such as algebra and arithmetic

Performs well on benchmarks like Minerva Algebra, MATH-500, and GSM8K

Research

Reasoning Mode Research

Investigates the adaptive reasoning capabilities of large models

Validates that the model effectively learns when to use long-chain reasoning

🚀 Thinkless: LLM Learns When to Think

Thinkless is a learnable framework that enables an LLM to adaptively choose between short - form and long - form reasoning. It reduces the computational cost of Reasoning Language Models by decreasing long - chain thinking usage on benchmarks.

image/png

Property	Details
Paper Link	ArXiv
GitHub	VainF/Thinkless
RL Model	Thinkless-1.5B-RL-DeepScaleR
Warmup Model	Thinkless-1.5B-Warmup
Data for Warmup	Hybrid-OpenThoughts2-1M-1.5B
Data for RL	agentica-org/DeepScaleR-Preview-Dataset

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-Warmup"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = f"{instruction}\nThe arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

think_mode = True
if think_mode:
    text = f"{text}<think>"
else:
    text = f"{text}<short>"

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=4096
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(text+response)
print(f"\nThink Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")

✨ Features

[!NOTE] Can LLMs learn when to think?

We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short - form and long - form reasoning based on both task complexity and the model's ability. Thinkless is trained under a reinforcement learning paradigm and employs two control tokens, <short> for concise responses and <think> for detailed reasoning. At the core of our method is a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective of hybrid reasoning into two components: (1) a control token loss that governs the selection of the reasoning mode, and (2) a response loss that improves the accuracy of the generated answers. This decoupled formulation enables fine - grained control over the contributions of each objective, stabilizing training and effectively preventing collapse observed in vanilla GRPO. Empirically, on several benchmarks such as Minerva Algebra, MATH - 500, and GSM8K, Thinkless is able to reduce the usage of long - chain thinking by 50% - 90%, significantly reducing the computational cost of Reasoning Language Models.

📚 Documentation

Pipeline

image/png

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Citation

If you find this work helpful, please cite:

@article{fang2025thinkless,
  title={Thinkless: LLM Learns When to Think},
  author={Fang, Gongfan and Ma, Xinyin and Wang, Xinchao},
  journal={arXiv preprint arXiv:2505.13379},
  year={2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご