AM-Thinking-v1 Open-Source Language Model - Enhance Reasoning Ability, Comparable to the Performance of Large-scale MoE Models

AM Thinking V1

Developed by a-m-team

A 32-billion-parameter dense language model focused on enhancing reasoning capabilities, built upon Qwen 2.5-32B-Base, demonstrating performance comparable to larger MoE models in reasoning benchmarks.

Large Language Model

Transformers

Open Source License:Apache-2.0 #32B inference optimization #Single-card deployment #Dense model

Downloads 1,377

Release Time : 5/10/2025

Model Overview

AM-Thinking-v1 is a 32-billion-parameter dense language model dedicated to enhancing reasoning abilities. Constructed on Qwen 2.5-32B-Base, it achieves flagship-level reasoning performance through meticulously designed training processes.

Model Features

High-performance reasoning capability

Demonstrates performance comparable to larger MoE models like DeepSeek-R1 and Qwen3-235B-A22B in reasoning benchmarks.

Single-card deployment

Deployable on a single A100-80GB GPU with deterministic latency, eliminating MoE routing overhead.

Built with open-source components

Fully constructed using open-source components, including Qwen 2.5-32B-Base and reinforcement learning training data.

Meticulously designed training process

Achieves flagship-level reasoning capabilities through supervised fine-tuning + dual-stage reinforcement learning training.

Model Capabilities

Text generation

Complex reasoning

Code generation

Use Cases

Reasoning tasks

Mathematical reasoning

Solving complex mathematical problems

Outstanding performance on AIME'24/'25 benchmarks

Code generation

Generating high-quality code

Surpasses DeepSeek-R1 on LiveCodeBench

🚀 AM‑Thinking‑v1: Advancing the Frontier of Reasoning at 32B Scale

We are excited to present AM-Thinking‑v1, a 32B dense language model meticulously crafted to elevate reasoning capabilities. Built upon the foundation of Qwen 2.5‑32B‑Base, this model demonstrates remarkable performance on reasoning benchmarks, standing toe-to-toe with significantly larger Mixture-of-Experts (MoE) models such as DeepSeek‑R1, Qwen3‑235B‑A22B, Seed1.5-Thinking, and even larger dense models like Nemotron-Ultra-253B-v1.

🤗 Hugging Face | 📑 Paper | 📑 Blog

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "a-m-team/AM-Thinking-v1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "How can I find inner peace?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=49152
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

response = tokenizer.decode(output_ids, skip_special_tokens=True)
think_content = response.split("<think>")[1].split("</think>")[0]
answer_content = response.split("<answer>")[1].split("</answer>")[0]

print (f"user prompt: {prompt}")
print (f"model thinking: {think_content}")
print (f"model answer: {answer_content}")

⚠️ Important Note

We have included the system prompt in the tokenizer configuration, as it was used during both the SFT and RL stages. To ensure consistent output quality, we recommend including the same system prompt during actual usage; otherwise, the model's responses may be significantly affected.

Quantized versions for compact devices

A series of quantized versions for AM-Thinking-v1 model. For use with llama.cpp and Ollama is available at AM-Thinking-v1-gguf.

✨ Features

Why Another 32B Reasoning Model Matters?

Large Mixture‑of‑Experts (MoE) models such as DeepSeek‑R1 or Qwen3‑235B‑A22B dominate leaderboards—but they also demand clusters of high‑end GPUs. Many teams just need the best dense model that fits on a single card. AM‑Thinking‑v1 fills that gap while remaining fully based on open-source components:

Outperforms DeepSeek‑R1 on AIME’24/’25 & LiveCodeBench and approaches Qwen3‑235B‑A22B despite being 1/7‑th the parameter count.
Built on the publicly available Qwen 2.5‑32B‑Base, as well as the RL training queries.
Shows that with a well‑designed post‑training pipeline ( SFT + dual‑stage RL ) you can squeeze flagship‑level reasoning out of a 32 B dense model.
Deploys on one A100‑80 GB with deterministic latency—no MoE routing overhead.

AM-Thinking-v1 achieves strong reasoning performance with significantly fewer parameters.

Use Cases

Code Generation

PROMPT :
write a python script for a bouncing red ball within a triangle, make sure to handle collision detection properly. make the triangle slowly rotate. implement it in python. make sure ball stays within the triangle

Logic

Writing

🔧 Technical Details

Post-training pipeline

To achieve its strong reasoning ability, AM‑Thinking‑v1 goes through a carefully designed post-training pipeline. Below we describe the key stages involved in turning a base model into a high-performing reasoner:

Step 1 – Cold‑start SFT. We begin with the open-sourced Qwen 2.5‑32B‑Base and run a broad supervised fine‑tune on a blended training dataset of math, code and open‑domain chat. This endows the model with a "think‑then‑answer" behavioural pattern and equips it with an initial capacity for reasoning.

Step 2 – Pass‑rate‑aware data curation. Before any RL, the SFT model is evaluated on every math‑ and code‑oriented training query. For each item we log a pass rate; only those with 0 < pass‑rate < 1 are kept. In effect we discard problems the model already masters and those it utterly fails, concentrating learning on genuinely informative cases.

Step 3 – Reinforcement learning. We adopt a two‑stage GRPO scheme: Stage 1 trains only on math and code queries. Once it converges, stage 2 starts by removing every query the model answered 100% correctly in Stage 1 and adjusting key hyper‑parameters such as maximum generation length and learning rate.

⚠️ Limitations

While AM‑Thinking‑v1 excels at pure language reasoning and open‑domain chat, it has not yet been trained for structured function‑calling or tool‑use workflows, which restricts its usefulness in agent‑style applications that must act on external systems. Improving the model's ability to follow complex instructions is also an important direction for our future work. In addition, our safety alignment is still at an early stage, so more rigorous red‑teaming are required to reduce potential harms.

📚 Documentation

The a-m-team is an internal team at Beike (Ke.com), dedicated to exploring AGI technology. If you find our work helpful, feel free to give us a cite.

@misc{ji2025amthinkingv1advancingfrontierreasoning,
      title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale}, 
      author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li},
      year={2025},
      eprint={2505.08311},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.08311}, 
}

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご