OpenCodeReasoning-Nemotron-32B-IOI Open-Source Code Generation Model - Long Context Enables Efficient Code Output

Opencodereasoning Nemotron 32B IOI

Developed by nvidia

OpenCodeReasoning-Nemotron-32B-IOI is a large language model based on Qwen2.5-32B-Instruct, specifically fine-tuned for code generation tasks with support for 32K token context length.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #32K long context #code reasoning optimization #multilingual programming support

Downloads 152

Release Time : 5/7/2025

Model Overview

This model is an inference-optimized model for code generation tasks, supporting multiple programming languages including Python and C++. It can be used for both commercial and non-commercial purposes, suitable for tasks such as code generation, code completion, and programming problem solving.

Model Features

Long context support

Supports 32K token context length, ideal for handling complex code generation tasks.

Multilingual code generation

Supports multiple programming languages including Python and C++, capable of generating high-quality code.

High-performance inference

Demonstrates excellent performance in multiple code generation benchmarks such as LiveCodeBench and CodeContests.

Model Capabilities

Code generation

Code completion

Programming problem solving

Multilingual support

Use Cases

Programming education

Programming problem solving

Helps students understand and solve programming problems, such as calculating the sum of Fibonacci sequences.

Generates high-quality code solutions to assist students in learning and practicing programming.

Software development

Code generation

Automatically generates code snippets to improve development efficiency.

Produces compliant code, reducing the time required for manual coding.

🚀 OpenCodeReasoning-Nemotron-32B-IOI

OpenCodeReasoning-Nemotron-32B-IOI is a large language model derived from Qwen2.5-32B-Instruct, post-trained for code generation reasoning and supporting a 32K token context length.

🚀 Quick Start

Prerequisites

Install the transformers library.
Ensure you have a compatible NVIDIA GPU and relevant CUDA libraries installed.

Running Inference

Here are examples of running inference on coding problems for different programming languages:

For C++ Programs

import transformers
import torch

model_id = "nvidia/OpenCodeReasoning-Nemotron-32B-IOI"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

prompt = """You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.

Please use c++ programming language only.

You must use ```cpp for just the final solution code block with the following format:
```cpp
// Your code here

{user} """

messages = [ { "role": "user", "content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")}, ]

outputs = pipeline( messages, max_new_tokens=32768, ) print(outputs[0]["generated_text"][-1]['content'])


#### For Python Programs
```python
import transformers
import torch

model_id = "nvidia/OpenCodeReasoning-Nemotron-32B"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

prompt = """You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.

Please use python programming language only.

You must use ```python for just the final solution code block with the following format:
```python
# Your code here

{user} """

messages = [ { "role": "user", "content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")}, ]

outputs = pipeline( messages, max_new_tokens=32768, ) print(outputs[0]["generated_text"][-1]['content'])


## ✨ Features
- **Based on a Strong Foundation**: Derived from Qwen2.5-32B-Instruct, it inherits the powerful capabilities of the base model.
- **Code Generation Reasoning**: Specifically post-trained for code generation reasoning, enabling better performance in competitive coding scenarios.
- **Long Context Support**: Supports a context length of up to 32K tokens, allowing for more comprehensive input and output.
- **High Performance on NVIDIA Systems**: Designed and optimized to run on NVIDIA GPU-accelerated systems, achieving faster training and inference times.

## 📦 Installation
The model can be easily used through the `transformers` library. You can install it using the following command:
```bash
pip install transformers

📚 Documentation

Model Overview

OpenCodeReasoning-Nemotron-32B-IOI is a large language model for code generation reasoning. It is a derivative of Qwen2.5-32B-Instruct and has 32B model parameters.

Results from OpenCodeReasoning

The following table shows the average results of 64 evaluations on each benchmark:

Model	Dataset Size Python	C++	LiveCodeBench (pass@1)	CodeContests (pass@1)	IOI (Total Score)
OlympicCoder-7B	0	100K	40.9	10.6	127
OlympicCoder-32B	0	100K	57.4	18.0	153.5
QWQ-32B	-	-	61.3	20.2	175.5
OpenCodeReasoning-IOI
OCR-Qwen-32B-Instruct	736K	356K	61.5	25.5	175.5

Reproducing Results

Models: You can find the models at Models.
Dataset: The dataset used is OpenCodeReasoning.
Paper: For more details, refer to the paper OpenCodeReasoning: Advancing Data Distillation for Competitive Coding.

Model Architecture

Architecture Type: Dense decoder-only Transformer model.
Network Architecture: Qwen-32B-Instruct.

Input

Input Type(s): Text
Input Format(s): String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Input: Context length up to 32,768 tokens

Output

Output Type(s): Text
Output Format: String
Output Parameters: One-Dimensional (1D)
Other Properties Related to Output: Context length up to 32,768 tokens

Software Integration

Runtime Engine: NeMo 2.3.0
Recommended Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper
Preferred/Supported Operating System(s): Linux

Model Version(s)

1.0 (4/25/2025)
OpenCodeReasoning-Nemotron-7B
OpenCodeReasoning-Nemotron-14B
OpenCodeReasoning-Nemotron-32B
OpenCodeReasoning-Nemotron-32B-IOI

Training and Evaluation Datasets

Training Dataset

The training corpus is the OpenCodeReasoning dataset, which consists of competitive programming questions and DeepSeek-R1 generated responses.

Data Collection Method: Hybrid: Automated, Human, Synthetic
Labeling Method: Hybrid: Automated, Human, Synthetic
Properties: 736k samples from OpenCodeReasoning

Evaluation Dataset

The datasets listed in the relevant section are used for evaluation.

Data Collection Method: Hybrid: Automated, Human, Synthetic
Labeling Method: Hybrid: Automated, Human, Synthetic

License/Terms of Use

The use of this model is governed by Apache 2.0.

Deployment Geography

Global

Use Case

This model is intended for developers and researchers building LLMs.

Release Date

Huggingface [04/25/2025] via https://huggingface.co/nvidia/OpenCodeReasoning-Nemotron-32B/

Inference

Engine: vLLM
Test Hardware: NVIDIA H100-80GB

🔧 Technical Details

Model Architecture

The model is a dense decoder-only Transformer model based on the Qwen-32B-Instruct architecture. It has 32B model parameters, which provides a strong foundation for code generation reasoning.

Input and Output

The model accepts text input in string format with a one-dimensional input parameter. It supports a context length of up to 32,768 tokens. The output is also text in string format with a one-dimensional output parameter and the same context length support.

Training and Optimization

The model is post-trained on the OpenCodeReasoning dataset for code generation reasoning. It is designed and optimized to run on NVIDIA GPU-accelerated systems, leveraging NVIDIA's hardware and software frameworks to achieve faster training and inference times.

📄 License

This model is licensed under the Apache 2.0 license.

Citation

If you find the data useful, please cite:

@article{ahmad2025opencodereasoning,
      title={OpenCodeReasoning: Advancing Data Distillation for Competitive Coding}, 
      author={Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg},
      year={2025},
      eprint={2504.01943},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.01943}, 
}

Additional Information

Ethical Considerations

NVIDIA believes that Trustworthy AI is a shared responsibility. When using this model, developers should work with their internal model team to ensure it meets the requirements of the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns as appropriate.

Model Performance Visualization

Evaluation Results

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご