Nano-aha-moment-3b Open-source Language Model - Free Solution for Mathematical Reasoning and Countdown Game Problems

Nano Aha Moment 3b

Developed by McGill-NLP

A 3-billion-parameter language model trained with reinforcement learning for solving mathematical reasoning tasks, especially countdown games.

Large Language Model

Transformers

#Mathematical Reasoning Reinforcement Learning #Countdown Game Specialized #GRPO Fine-tuning

Downloads 55

Release Time : 3/31/2025

Model Overview

A language model based on Qwen2.5-3B, fine-tuned using GRPO, specifically designed for mathematical reasoning tasks, particularly countdown games.

Model Features

Mathematical Reasoning Optimization

Specifically trained with reinforcement learning for mathematical reasoning tasks such as countdown games

Structured Reasoning Output

Displays reasoning process within <think> tags and provides final answers within <answer> tags

Efficient Training Techniques

Utilizes Flash Attention 2, DeepSpeed ZeRO Stage 2, and vLLM for efficient training and inference

Model Capabilities

Mathematical Reasoning

Countdown Game Solving

Structured Reasoning Process Display

Use Cases

Education

Mathematical Thinking Training

Used to train students' ability to solve mathematical problems such as countdown games

Can display complete problem-solving steps and reasoning

Gaming

Countdown Game Assistance

Helps players solve mathematical challenges in countdown games

Provides multiple possible solutions

🚀 Nano Aha Moment 3B Model

A 3B parameter language model trained for mathematical reasoning tasks, especially the Countdown game.

🚀 Quick Start

You can access the model repository at https://github.com/McGill-NLP/nano-aha-moment. You can also interactively test the model's reasoning capabilities using the checkpoint playground notebook in the repository.

✨ Features

Mathematical Reasoning: Specifically designed to solve the Countdown game, creating equations from a set of numbers to reach a target value.
Reasoning Display: Shows the reasoning process in <think> tags and provides the final answer in <answer> tags.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

The model is used to solve the Countdown game. You can use the checkpoint playground notebook to test it interactively.

Advanced Usage

There is no advanced usage code example in the original document, so this part is not added.

📚 Documentation

Model Details

Model Description

This is a 3B parameter language model trained using reinforcement learning to solve mathematical reasoning tasks, specifically the Countdown game. The model is based on Qwen2.5-3B and has been fine-tuned with GRPO using nanoAhaMoment codebase.

Property	Details
Developed by	McGill-NLP Lab
Model Type	Causal Language Model
Language(s) (NLP)	English
License	MIT
Finetuned from model	Qwen/Qwen2.5-3B

Model Sources

Repository: https://github.com/McGill-NLP/nano-aha-moment
Demo: Available in the repository's checkpoint playground notebook

Uses

Direct Use

The model is designed to solve mathematical reasoning tasks, specifically the Countdown game where it needs to create equations using a set of numbers to reach a target value. The model shows its reasoning process in <think> tags and provides the final answer in <answer> tags.

Out-of-Scope Use

The model is specifically trained for mathematical reasoning tasks and may not perform well on general language tasks or other domains outside its training scope.

Bias, Risks, and Limitations

The model has been trained on a specific mathematical reasoning task and may have limitations in:

General language understanding and generation
Handling complex mathematical problems outside the Countdown game format
Maintaining consistent reasoning across different problem types

Recommendations

💡 Usage Tip

Use the model specifically for the Countdown game task it was trained on.

Be aware of the model's focus on mathematical reasoning.

Consider the model's limitations when applying it to other tasks.

Training Details

Training Data

The model was trained on the Countdown-Tasks-3to4 dataset, which contains problem statements for the Countdown game where the goal is to reach a target number using a set of available numbers and basic arithmetic operations.

Training Procedure

Preprocessing

The training data was preprocessed to include:

System message for reasoning guidance
Structured prompt template for the Countdown game
Special tags for reasoning steps and answers

Training Hyperparameters

Training regime: bf16 mixed precision
Learning rate: 1e-6
Batch size: 64 episodes per iteration
Optimizer: AdamW
KL coefficient: 0.001
Temperature: 1.0

Technical Specifications

Model Architecture and Objective

The model is based on the Qwen2.5-3B architecture and uses:

Flash Attention 2 for efficient attention computation
DeepSpeed ZeRO Stage 2 for memory optimization
vLLM for efficient inference

Compute Infrastructure

Software

PyTorch 2.5.1
Transformers 4.48.3
DeepSpeed 0.16.4
vLLM 0.7.3
Flash Attention 2.7.2

Citation

BibTeX:

@misc{Kazemnejad2025:NanoAhaMoment,
  author       = {Amirhossein Kazemnejad and Milad Aghajohari and Alessandro Sordoni and Aaron Courville and Siva Reddy},
  title        = {Nano Aha! Moment: Single File "RL for LLM" Library},
  year         = {2025},
  howpublished = {\url{https://github.com/McGill-NLP/nano-aha-moment}},
  note         = {GitHub repository}
}

Model Card Authors

McGill-NLP Lab

Model Card Contact

For questions about this model card, please contact the McGill-NLP Lab.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご