đ AMâThinkingâv1: Advancing the Frontier of Reasoning at 32B Scale
We are excited to present AM-Thinkingâv1, a 32B dense language model meticulously crafted to elevate reasoning capabilities. Built upon the foundation of QwenâŻ2.5â32BâBase, this model demonstrates remarkable performance on reasoning benchmarks, standing toe-to-toe with significantly larger Mixture-of-Experts (MoE) models such as DeepSeekâR1, Qwen3â235BâA22B, Seed1.5-Thinking, and even larger dense models like Nemotron-Ultra-253B-v1.
đ¤ Hugging Face   |    đ Paper    |    đ Blog   
đ Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "a-m-team/AM-Thinking-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "How can I find inner peace?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=49152
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
response = tokenizer.decode(output_ids, skip_special_tokens=True)
think_content = response.split("<think>")[1].split("</think>")[0]
answer_content = response.split("<answer>")[1].split("</answer>")[0]
print (f"user prompt: {prompt}")
print (f"model thinking: {think_content}")
print (f"model answer: {answer_content}")
â ď¸ Important Note
We have included the system prompt in the tokenizer configuration, as it was used during both the SFT and RL stages. To ensure consistent output quality, we recommend including the same system prompt during actual usage; otherwise, the model's responses may be significantly affected.
Quantized versions for compact devices
A series of quantized versions for AM-Thinking-v1 model.
For use with llama.cpp and Ollama
is available at AM-Thinking-v1-gguf.
⨠Features
Why Another 32B Reasoning Model Matters?
Large MixtureâofâExperts (MoE) models such as DeepSeekâR1 or Qwen3â235BâA22B dominate leaderboardsâbut they also demand clusters of highâend GPUs. Many teams just need the best dense model that fits on a single card.
AMâThinkingâv1 fills that gap while remaining fully based on open-source components:
- Outperforms DeepSeekâR1 on AIMEâ24/â25 & LiveCodeBench and approaches Qwen3â235BâA22B despite being 1/7âth the parameter count.
- Built on the publicly availableâŻQwenâŻ2.5â32BâBase, as well as the RL training queries.
- Shows that with a wellâdesigned postâtraining pipeline ( SFT + dualâstage RL ) you can squeeze flagshipâlevel reasoning out of a 32âŻB dense model.
- Deploys on one A100â80âŻGB with deterministic latencyâno MoE routing overhead.
AM-Thinking-v1 achieves strong reasoning performance with significantly fewer parameters.
Use Cases
Code Generation
PROMPT :
write a python script for a bouncing red ball within a triangle, make sure to handle collision detection properly. make the triangle slowly rotate. implement it in python. make sure ball stays within the triangle
Logic
Writing
đ§ Technical Details
Post-training pipeline
To achieve its strong reasoning ability, AMâThinkingâv1 goes through a carefully designed post-training pipeline.
Below we describe the key stages involved in turning a base model into a high-performing reasoner:
StepâŻ1 â Coldâstart SFT.
We begin with the open-sourced QwenâŻ2.5â32BâBase and run a broad supervised fineâtune on a blended training dataset of math, code and openâdomain chat. This endows the model with a "thinkâthenâanswer" behavioural pattern and equips it with an initial capacity for reasoning.
StepâŻ2 â Passârateâaware data curation.
Before any RL, the SFT model is evaluated on every mathâ and codeâoriented training query. For each item we log a pass rate; only those with 0âŻ<âŻpassârateâŻ<âŻ1 are kept. In effect we discard problems the model already masters and those it utterly fails, concentrating learning on genuinely informative cases.
StepâŻ3 â Reinforcement learning.
We adopt a twoâstage GRPO scheme: StageâŻ1 trains only on math and code queries. Once it converges, stage 2 starts by removing every query the model answered 100% correctly in StageâŻ1 and adjusting key hyperâparameters such as maximum generation length and learning rate.
â ď¸ Limitations
While AMâThinkingâv1 excels at pure language reasoning and openâdomain chat, it has not yet been trained for structured functionâcalling or toolâuse workflows, which restricts its usefulness in agentâstyle applications that must act on external systems.
Improving the model's ability to follow complex instructions is also an important direction for our future work.
In addition, our safety alignment is still at an early stage, so more rigorous redâteaming are required to reduce potential harms.
đ Documentation
The a-m-team is an internal team at Beike (Ke.com), dedicated to exploring AGI technology.
If you find our work helpful, feel free to give us a cite.
@misc{ji2025amthinkingv1advancingfrontierreasoning,
title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale},
author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li},
year={2025},
eprint={2505.08311},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.08311},
}
đ License
This project is licensed under the Apache-2.0 license.