DAPO-Qwen-32B开源大语言模型 - 免费解决数学问题，支持多语言文本生成

首页

DAPO Qwen 32B

由 BytedTsinghua-SIA 开发

基于Qwen2.5-32B模型使用DAPO算法训练的大语言模型，专注于数学问题解决和多语言文本生成

大型语言模型

Safetensors

支持多种语言开源协议:Apache-2.0 #数学推理强化 #多语言数学解题 #DAPO算法优化

下载量 7,241

发布时间 : 4/9/2025

模型简介

DAPO-Qwen-32B是基于Qwen2.5-32B模型使用DAPO算法训练的大语言模型，主要用于解决复杂数学问题和多语言文本生成任务。

模型特点

DAPO算法训练

采用DAPO算法进行训练，提升了模型在数学问题解决方面的性能

多语言支持

支持13种语言的文本生成和理解

数学问题解决能力

特别优化了解决复杂数学问题的能力，能够分步推理并给出最终答案

模型能力

数学问题求解

多语言文本生成

分步推理

复杂问题解答

使用案例

教育

数学问题解答

帮助学生解决复杂的数学问题，提供分步解答过程

能够准确解答各类数学问题，包括代数、几何等

多语言应用

多语言文本生成

支持13种语言的文本生成

能够生成流畅的多语言文本

🚀 DAPO-Qwen-32B

DAPO-Qwen-32B 模型基于 Qwen2.5 - 32B 模型，使用 DAPO 算法训练得到，可用于文本生成，支持中、英、法、西、葡、德、意、俄、日、韩、越、泰、阿等多种语言。

🚀 快速开始

推理示例

import torch
from transformers import AutoTokenizer
from vllm import SamplingParams, LLM

examples = [
    {
        "question": "Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\n\nFind the largest possible real part of \\[(75+117i)z+\\frac{96+144i}{z}\\]where $z$ is a complex number with $|z|=4$.\n\nRemember to put your answer on its own line after \"Answer:\".",
        "answer": "540"
    },
    {
        "question": "Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\n\nEvery morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop.\n\nRemember to put your answer on its own line after \"Answer:\".",
        "answer": "204"
    },
    {
        "question": "Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\n\nLet $\\mathcal{B}$ be the set of rectangular boxes with surface area $54$ and volume $23$. Let $r$ be the radius of the smallest sphere that can contain each of the rectangular boxes that are elements of $\\mathcal{B}$. The value of $r^2$ can be written as $\\frac{p}{q}$, where $p$ and $q$ are relatively prime positive integers. Find $p+q$.\n\nRemember to put your answer on its own line after \"Answer:\".",
        "answer": "721"
    }
]


def main():
    model = "BytedTsinghua-SIA/DAPO-Qwen-32B"

    tokenzier = AutoTokenizer.from_pretrained(model)

    llm = LLM(
        model=model,
        dtype=torch.bfloat16,
        tensor_parallel_size=8,
        gpu_memory_utilization=0.95
    )

    sampling_params = SamplingParams(
        temperature=1.0,
        top_p=0.7,
        max_tokens=20480
    )

    for example in examples:
        question = example["question"]
        answer = example["answer"]
        output = llm.generate(
                    prompts=tokenzier.apply_chat_template(conversation=[{"content": question, "role": "user"}],
                                                          add_generation_prompt=True,
                                                          tokenize=False),
                    sampling_params=sampling_params
                )
        print(f"***QUESTION***:\n{question}\n***GROUND TRUTH***:\n{answer}\n***MODEL OUTPUT***:\n{output[0].outputs[0].text}\n")
        print("-"*100)

if __name__ == "__main__":
    main()

📚 详细文档

训练与性能

更多关于训练和性能的信息，请参考以下资源：

引用信息

@misc{yu2025dapoopensourcellmreinforcement,
      title={DAPO: An Open-Source LLM Reinforcement Learning System at Scale}, 
      author={Qiying Yu and Zheng Zhang and Ruofei Zhu and Yufeng Yuan and Xiaochen Zuo and Yu Yue and Tiantian Fan and Gaohong Liu and Lingjun Liu and Xin Liu and Haibin Lin and Zhiqi Lin and Bole Ma and Guangming Sheng and Yuxuan Tong and Chi Zhang and Mofan Zhang and Wang Zhang and Hang Zhu and Jinhua Zhu and Jiaze Chen and Jiangjie Chen and Chengyi Wang and Hongli Yu and Weinan Dai and Yuxuan Song and Xiangpeng Wei and Hao Zhou and Jingjing Liu and Wei-Ying Ma and Ya-Qin Zhang and Lin Yan and Mu Qiao and Yonghui Wu and Mingxuan Wang},
      year={2025},
      eprint={2503.14476},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.14476}, 
}