AceMath-7B-Instruct開源數學推理模型 - 免費部署解決英文數學問題

首頁

Acemath 7B Instruct

由nvidia開發

AceMath-7B-Instruct是英偉達推出的專為數學推理設計的指導模型，基於改進版Qwen架構開發，擅長通過思維鏈(CoT)推理解決英文數學問題。

大型語言模型

Safetensors

英語#數學推理 #思維鏈優化 #多階段微調

下載量 1,454

發布時間 : 1/13/2025

模型概述

AceMath系列模型專為數學推理設計，包含不同規模的指導模型和獎勵模型。指導模型擅長通過思維鏈推理解決數學問題，獎勵模型則專注於數學解答的評估與評分。

模型特點

數學專項優化

專為數學推理設計，通過多階段監督微調流程提升數學問題解決能力。

思維鏈推理

擅長通過思維鏈(CoT)推理解決複雜的數學問題。

性能優異

7B版本在多項數學推理基準上顯著超越此前最佳模型，性能接近10倍參數量的72B版本。

完整訓練數據公開

公開全部訓練數據以支持相關研究。

模型能力

數學問題求解

思維鏈推理

英文文本生成

使用案例

教育

數學問題解答

幫助學生理解和解決複雜的數學問題。

在多項數學推理基準上表現優異。

研究

數學推理研究

支持數學推理和思維鏈相關的研究。

公開的訓練數據可用於進一步研究。

🚀 AceMath - 前沿數學推理模型

AceMath 是一系列專為數學推理設計的前沿模型。該系列模型在數學推理方面表現卓越，能夠有效解決各類數學問題，為數學研究和應用提供強大支持。

🚀 快速開始

模型簡介

AceMath 家族模型包括 AceMath - 1.5B/7B/72B - Instruct 和 AceMath - 7B/72B - RM，這些模型基於 Qwen 進行改進。其中，AceMath - 1.5B/7B/72B - Instruct 模型擅長使用思維鏈（CoT）推理解決英文數學問題，而 AceMath - 7B/72B - RM 模型作為結果獎勵模型，專門用於評估和打分數學解決方案。

AceMath - 1.5B/7B/72B - Instruct 模型是在 Qwen2.5 - Math - 1.5B/7B/72B - Base 模型的基礎上，通過多階段監督微調（SFT）過程開發而成：先使用通用 SFT 數據，再使用特定數學 SFT 數據。我們將發佈所有訓練數據，以支持該領域的進一步研究。

我們僅建議使用 AceMath 模型解決數學問題。為支持其他任務，我們還發布了 AceInstruct - 1.5B/7B/72B，這是一系列旨在處理代碼、數學和通用知識任務的通用 SFT 模型，它們基於 Qwen2.5 - 1.5B/7B/72B - Base 構建。

如需瞭解更多關於 AceMath 的信息，請訪問我們的網站和論文。

✨ 主要特性

強大的數學推理能力

AceMath - 7B - Instruct 在各種數學推理基準測試中，大幅超越了之前同類最佳的 Qwen2.5 - Math - 7B - Instruct（平均通過率@1：67.2 對比 62.9），接近 10 倍大的 Qwen2.5 - Math - 72B - Instruct 的性能（67.2 對比 68.2）。值得注意的是，我們的 AceMath - 72B - Instruct 大幅超越了最先進的 Qwen2.5 - Math - 72B - Instruct（71.8 對比 68.2）、GPT - 4o（67.4）和 Claude 3.5 Sonnet（65.6）。

優秀的獎勵模型

我們的獎勵模型 AceMath - 72B - RM 實現的 rm@8 準確率（8 選最佳）在這些推理基準測試中創下了新紀錄，不包括依賴大規模推理計算的 OpenAI 的 o1 模型。

📦 全部資源

AceMath 指令模型

[AceMath - 1.5B - Instruct](https://huggingface.co/nvidia/AceMath - 1.5B - Instruct)
[AceMath - 7B - Instruct](https://huggingface.co/nvidia/AceMath - 7B - Instruct)
[AceMath - 72B - Instruct](https://huggingface.co/nvidia/AceMath - 72B - Instruct)

AceMath 獎勵模型

[AceMath - 7B - RM](https://huggingface.co/nvidia/AceMath - 7B - RM)
[AceMath - 72B - RM](https://huggingface.co/nvidia/AceMath - 72B - RM)

評估與訓練數據

[AceMath - RewardBench](https://huggingface.co/datasets/nvidia/AceMath - RewardBench)
[AceMath - Instruct 訓練數據](https://huggingface.co/datasets/nvidia/AceMath - Instruct - Training - Data)
[AceMath - RM 訓練數據](https://huggingface.co/datasets/nvidia/AceMath - RM - Training - Data)

通用指令模型

[AceInstruct - 1.5B](https://huggingface.co/nvidia/AceInstruct - 1.5B)
[AceInstruct - 7B](https://huggingface.co/nvidia/AceInstruct - 7B)
[AceInstruct - 72B](https://huggingface.co/nvidia/AceInstruct - 72B)

📊 基準測試結果（AceMath - Instruct + AceMath - 72B - RM）

![AceMath 基準測試結果](acemath - pic.png)

在上表中，我們將 AceMath 與領先的專有和開放訪問數學模型進行了比較。我們的 AceMath - 7B - Instruct 在各種數學推理基準測試中表現出色，大幅超越了之前同類最佳的 Qwen2.5 - Math - 7B - Instruct，接近 10 倍大的 Qwen2.5 - Math - 72B - Instruct 的性能。值得注意的是，我們的 AceMath - 72B - Instruct 大幅超越了最先進的 Qwen2.5 - Math - 72B - Instruct、GPT - 4o 和 Claude 3.5 Sonnet。我們還報告了我們的獎勵模型 AceMath - 72B - RM 實現的 rm@8 準確率，在這些推理基準測試中創下了新紀錄。

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nvidia/AceMath-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

📬 聯繫方式

Zihan Liu (zihanl@nvidia.com)
Yang Chen (yachen@nvidia.com)
Wei Ping (wping@nvidia.com)

📚 引用信息

如果您覺得我們的工作有幫助，請引用我們的論文：

@article{acemath2024,
  title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling},
  author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint},
  year={2024}
}

📄 許可證

AceMath 家族的所有模型僅用於非商業用途，需遵守 [OpenAI 數據使用條款](https://openai.com/policies/row - terms - of - use/)。我們將 AceMath 模型置於 [知識共享署名 - 非商業性使用 4.0 國際許可協議](https://spdx.org/licenses/CC - BY - NC - 4.0) 之下。