๐ ELYZA-Shortcut-1.0-Qwen-32B
ELYZA-Shortcut-1.0-Qwen-32B is a non - reasoning model. It is derived during the development of the reasoning model ELYZA-Thinking-1.0-Qwen-32B. Based on Qwen/Qwen2.5-32B-Instruct, this model can directly generate the final answer by bypassing the step - by - step reasoning process after post - training.

๐ Quick Start
This model can be used with the Hugging Face Transformers library. You can follow the steps below to start using it.
โจ Features
- Derived Model: It is a non - reasoning model derived from the development of ELYZA-Thinking-1.0-Qwen-32B.
- Direct Answer Generation: Based on Qwen/Qwen2.5-32B-Instruct, it can directly generate the final answer without step - by - step reasoning after post - training.
- Post - training Method: During post - training, it uses supervised fine - tuning (SFT) with problem - solution pairs obtained by removing reasoning steps from optimal reasoning paths explored through an MCTS - based algorithm.
๐ฆ Installation
To use this model, you need to install the necessary libraries. You can install the transformers
library using the following command:
pip install transformers torch
For deployment, you can install vLLM to create an OpenAI - Compatible Server:
pip install vllm
๐ป Usage Examples
Basic Usage
The following code shows how to use the model for inference:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "elyza/ELYZA-Shortcut-1.0-Qwen-32B"
prompt = "ไปไบใฎ็ฑๆใๅใๆปใใใใฎใขใคใใขใ5ใคๆใใฆใใ ใใใ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
)
model.eval()
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
token_ids = tokenizer.encode(
input_text, add_special_tokens=False, return_tensors="pt"
)
with torch.no_grad():
output_ids = model.generate(
token_ids.to(model.device),
max_new_tokens=8192,
do_sample=True,
temperature=0.6,
top_p=0.95,
)
output = tokenizer.decode(
output_ids.tolist()[0][token_ids.size(1):], skip_special_tokens=True
)
print(output)
Advanced Usage
For deployment, vLLM is recommended to create an OpenAI - Compatible Server. The following command shows how to serve the model using vLLM
:
vllm serve elyza/ELYZA-Shortcut-1.0-Qwen-32B \
--tensor-parallel-size 8 \
--max-model-len 32768
๐ Documentation
Model Description
ELYZA-Shortcut-1.0-Qwen-32B is a non - reasoning model derived during the development of the reasoning model ELYZA-Thinking-1.0-Qwen-32B. Based on Qwen/Qwen2.5-32B-Instruct, this model has been post - trained to bypass the step - by - step reasoning process and directly generate the final answer (Built with Qwen).
During the post - training phase, the model was trained via supervised fine - tuning (SFT) using problem - solution pairs. These pairs were obtained by removing reasoning steps from optimal reasoning paths explored through an Monte Carlo Tree Search (MCTS) based algorithm. For more details, please refer to our blog post.
How to Cite
If you use this model in your work, please cite it as follows:
@misc{elyza2025thinking,
title={elyza/ELYZA-Thinking-1.0-Qwen-32B},
url={https://huggingface.co/elyza/ELYZA-Thinking-1.0-Qwen-32B},
author={Masato Hirakawa and Tomoaki Nakamura and Akira Sasaki and Daisuke Oba and Shoetsu Sato},
year={2025},
}
Citations
@misc{qwen2.5,
title = {Qwen2.5: A Party of Foundation Models},
url = {https://qwenlm.github.io/blog/qwen2.5/},
author = {Qwen Team},
month = {September},
year = {2024}
}
@article{qwen2,
title={Qwen2 Technical Report},
author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
journal={arXiv preprint arXiv:2407.10671},
year={2024}
}
๐ License
This project is licensed under the apache - 2.0
license.