🚀 Nano Aha Moment 3B Model
A 3B parameter language model trained for mathematical reasoning tasks, especially the Countdown game.
🚀 Quick Start
You can access the model repository at https://github.com/McGill-NLP/nano-aha-moment. You can also interactively test the model's reasoning capabilities using the checkpoint playground notebook in the repository.
✨ Features
- Mathematical Reasoning: Specifically designed to solve the Countdown game, creating equations from a set of numbers to reach a target value.
- Reasoning Display: Shows the reasoning process in
<think>
tags and provides the final answer in <answer>
tags.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
The model is used to solve the Countdown game. You can use the checkpoint playground notebook to test it interactively.
Advanced Usage
There is no advanced usage code example in the original document, so this part is not added.
📚 Documentation
Model Details
Model Description
This is a 3B parameter language model trained using reinforcement learning to solve mathematical reasoning tasks, specifically the Countdown game. The model is based on Qwen2.5-3B and has been fine-tuned with GRPO using nanoAhaMoment codebase.
Property |
Details |
Developed by |
McGill-NLP Lab |
Model Type |
Causal Language Model |
Language(s) (NLP) |
English |
License |
MIT |
Finetuned from model |
Qwen/Qwen2.5-3B |
Model Sources
Uses
Direct Use
The model is designed to solve mathematical reasoning tasks, specifically the Countdown game where it needs to create equations using a set of numbers to reach a target value. The model shows its reasoning process in <think>
tags and provides the final answer in <answer>
tags.
Out-of-Scope Use
The model is specifically trained for mathematical reasoning tasks and may not perform well on general language tasks or other domains outside its training scope.
Bias, Risks, and Limitations
The model has been trained on a specific mathematical reasoning task and may have limitations in:
- General language understanding and generation
- Handling complex mathematical problems outside the Countdown game format
- Maintaining consistent reasoning across different problem types
Recommendations
💡 Usage Tip
- Use the model specifically for the Countdown game task it was trained on.
- Be aware of the model's focus on mathematical reasoning.
- Consider the model's limitations when applying it to other tasks.
Training Details
Training Data
The model was trained on the Countdown-Tasks-3to4 dataset, which contains problem statements for the Countdown game where the goal is to reach a target number using a set of available numbers and basic arithmetic operations.
Training Procedure
Preprocessing
The training data was preprocessed to include:
- System message for reasoning guidance
- Structured prompt template for the Countdown game
- Special tags for reasoning steps and answers
Training Hyperparameters
- Training regime: bf16 mixed precision
- Learning rate: 1e-6
- Batch size: 64 episodes per iteration
- Optimizer: AdamW
- KL coefficient: 0.001
- Temperature: 1.0
Technical Specifications
Model Architecture and Objective
The model is based on the Qwen2.5-3B architecture and uses:
- Flash Attention 2 for efficient attention computation
- DeepSpeed ZeRO Stage 2 for memory optimization
- vLLM for efficient inference
Compute Infrastructure
Software
- PyTorch 2.5.1
- Transformers 4.48.3
- DeepSpeed 0.16.4
- vLLM 0.7.3
- Flash Attention 2.7.2
Citation
BibTeX:
@misc{Kazemnejad2025:NanoAhaMoment,
author = {Amirhossein Kazemnejad and Milad Aghajohari and Alessandro Sordoni and Aaron Courville and Siva Reddy},
title = {Nano Aha! Moment: Single File "RL for LLM" Library},
year = {2025},
howpublished = {\url{https://github.com/McGill-NLP/nano-aha-moment}},
note = {GitHub repository}
}
Model Card Authors
McGill-NLP Lab
Model Card Contact
For questions about this model card, please contact the McGill-NLP Lab.