OpenMath-Nemotron-14B Open-Source Mathematical Reasoning Model - Leading in Multiple Benchmark Tests, Empowering the Solution of Mathematical Puzzles

Openmath Nemotron 14B

Developed by nvidia

OpenMath-Nemotron-14B is a mathematical reasoning model fine-tuned on the OpenMathReasoning dataset based on Qwen2.5-14B, achieving state-of-the-art results on multiple mathematical benchmarks.

Large Language Model

Transformers

English#Mathematical Reasoning #Competition-Level Problem Solving #Tool-Integrated Reasoning

Downloads 183

Release Time : 4/25/2025

Model Overview

This model specializes in mathematical reasoning tasks, solving complex mathematical problems through Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR), suitable for scenarios like mathematical competition problem solving.

Model Features

Mathematical Reasoning Capability

Achieves state-of-the-art performance on mathematical competition benchmarks like AIME and HMMT.

Multi-Reasoning Mode Support

Supports three reasoning modes: Chain-of-Thought (CoT), Tool-Integrated Reasoning (TIR), and Generation Solution Selection (GenSelect).

Commercial Usability

Based on an open-source license, it can be directly used for commercial purposes.

Kaggle Competition Validation

Won first place in the AIMO-2 Kaggle competition.

Model Capabilities

Mathematical Problem Solving

Complex Mathematical Reasoning

Competition-Level Mathematical Problem Solving

Multi-Step Mathematical Derivation

Use Cases

Mathematics Education

Mathematical Competition Problem Solving

Solving problems from mathematical competitions like AIME and HMMT

Achieved 90.0% accuracy on the AIME24 test set

Mathematics Learning Assistance

Helping students understand the solution process of complex mathematical problems

Academic Research

Mathematical Reasoning Research

Serving as a benchmark model for research on mathematical reasoning capabilities

🚀 OpenMath-Nemotron-14B

OpenMath-Nemotron-14B is a powerful model created by fine-tuning Qwen/Qwen2.5-14B on the OpenMathReasoning dataset. It is ready for commercial use and offers state-of-the-art performance in mathematical benchmarks.

Evaluation Results

🚀 Quick Start

OpenMath-Nemotron models achieve state-of-the-art results on popular mathematical benchmarks. We present metrics as pass@1 (maj@64), where pass@1 is an average accuracy across 64 generations and maj@64 is the result of majority voting. For more details on the evaluation setup, please refer to our paper.

Model	AIME24	AIME25	HMMT-24-25	HLE-Math
DeepSeek-R1-Distill-Qwen-1.5B	26.8 (60.0)	21.4 (36.7)	14.2 (26.5)	2.9 (5.0)
OpenMath-Nemotron-1.5B CoT	61.6 (80.0)	49.5 (66.7)	39.9 (53.6)	5.4 (5.4)
OpenMath-Nemotron-1.5B TIR	52.0 (83.3)	39.7 (70.0)	37.2 (60.7)	2.5 (6.2)
+ Self GenSelect	83.3	70.0	62.2	7.9
+ 32B GenSelect	83.3	70.0	62.8	8.3
DeepSeek-R1-Distill-Qwen-7B	54.4 (80.0)	38.6 (53.3)	30.6 (42.9)	3.3 (5.2)
OpenMath-Nemotron-7B CoT	74.8 (80.0)	61.2 (76.7)	49.7 (57.7)	6.6 (6.6)
OpenMath-Nemotron-7B TIR	72.9 (83.3)	57.5 (76.7)	54.6 (66.3)	7.8 (10.8)
+ Self GenSelect	86.7	76.7	68.4	11.5
+ 32B GenSelect	86.7	76.7	69.9	11.9
DeepSeek-R1-Distill-Qwen-14B	65.8 (80.0)	48.4 (60.0)	40.1 (52.0)	4.2 (4.8)
OpenMath-Nemotron-14B-MIX (kaggle)	73.7 (86.7)	57.9 (73.3)	50.5 (64.8)	5.7 (6.5)
OpenMath-Nemotron-14B CoT	76.3 (83.3)	63.0 (76.7)	52.1 (60.7)	7.5 (7.6)
OpenMath-Nemotron-14B TIR	76.3 (86.7)	61.3 (76.7)	58.6 (70.9)	9.5 (11.5)
+ Self GenSelect	86.7	76.7	72.4	14.1
+ 32B GenSelect	90.0	76.7	71.9	13.7
QwQ-32B	78.1 (86.7)	66.5 (76.7)	55.9 (63.3)	9.0 (9.5)
DeepSeek-R1-Distill-Qwen-32B	66.9 (83.3)	51.8 (73.3)	39.9 (51.0)	4.8 (6.0)
OpenMath-Nemotron-32B CoT	76.5 (86.7)	62.5 (73.3)	53.0 (59.2)	8.3 (8.3)
OpenMath-Nemotron-32B TIR	78.4 (93.3)	64.2 (76.7)	59.7 (70.9)	9.2 (12.5)
+ Self GenSelect	93.3	80.0	73.5	15.7
DeepSeek-R1	79.1 (86.7)	64.3 (73.3)	53.0 (59.2)	10.5 (11.4)

We used a version of OpenMath-Nemotron-14B model to secure the first place in AIMO-2 Kaggle competition!

✨ Features

Fine-tuned on Math Data: Trained on the OpenMathReasoning dataset for excellent mathematical reasoning.
Commercial Use Ready: Suitable for various commercial applications.
State-of-the-Art Performance: Achieves top results on popular mathematical benchmarks.

📦 Installation

The pipeline used to produce the data and models is fully open-sourced!

We provide all instructions to fully reproduce our results, including data generation.

💻 Usage Examples

Basic Usage

Our models can be used in 3 inference modes: chain-of-thought (CoT), tool-integrated reasoning (TIR), and generative solution selection (GenSelect).

To run inference with CoT mode, you can use this example code snippet.

import transformers
import torch

model_id = "nvidia/OpenMath-Nemotron-14B"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {
        "role": "user", 
        "content": "Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.\n\n" + 
        "What is the minimum value of $a^2+6a-7$?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=4096,
)
print(outputs[0]["generated_text"][-1]['content'])

Advanced Usage

To run inference with TIR or GenSelect modes, we highly recommend using our reference implementation in NeMo-Skills.

⚠️ Important Note

These models have not been instruction tuned on general data and thus might not provide good answers outside of the math domain.

📚 Documentation

Additional Information

Information Table

Property	Details
Model Type	Transformer decoder-only language model (Qwen2.5 architecture)
Training Data	OpenMathReasoning
Deployment Geography	Global
Use Case	Facilitate research in the area of mathematical reasoning
Release Date	Huggingface 04/23/2025
Input Type	Text (String format, One-Dimensional, Context length up to 131,072 tokens)
Output Type	Text (String format, One-Dimensional, Context length up to 131,072 tokens)
Runtime Engine	Tensor RT / Triton
Supported Hardware	NVIDIA Ampere, NVIDIA Hopper
Preferred OS	Linux
Model Versions	OpenMath-Nemotron-1.5B, OpenMath-Nemotron-7B, OpenMath-Nemotron-14B, OpenMath-Nemotron-32B

License/Terms of Use

GOVERNING TERMS: Use of this model is governed by CC-BY-4.0. Additional Information: Apache License Version 2.0.

📄 License

@article{moshkov2025aimo2,
  title   = {AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset},
  author  = {Ivan Moshkov and Darragh Hanley and Ivan Sorokin and Shubham Toshniwal and Christof Henkel and Benedikt Schifferer and Wei Du and Igor Gitman},
  year    = {2025},
  journal = {arXiv preprint arXiv:2504.16891}
}

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご