OpenRS-GRPO Open-Source Language Model - Free Upgrade for 1.5B Parameter Model to Enhance Mathematical Reasoning Ability!

Openrs GRPO

Developed by knoveleng

Open RS is a small-scale language model project optimized through reinforcement learning, focusing on enhancing the mathematical reasoning capabilities of a 1.5B-parameter model while achieving efficient training under resource constraints.

Large Language Model

Safetensors

Open Source License:MIT #Small Model Reinforcement Learning #Low-Cost Mathematical Reasoning #AMC Competition Optimization

Downloads 30

Release Time : 3/18/2025

Model Overview

This project explores improving the reasoning abilities of small-scale language models via reinforcement learning (RL), employing the Group Relative Policy Optimization (GRPO) algorithm and training with a curated mathematical reasoning dataset.

Model Features

Efficient Reinforcement Learning Training

Training completed within 24 hours using only 7,000 samples at a cost of $42.

Significant Reasoning Improvement

AMC23 accuracy increased from 63% to 80%, while AIME24 achieved 46.7%, surpassing baseline models.

Resource-Friendly Optimization

Training can be completed with just 4 NVIDIA A40 GPUs (each with 48GB VRAM).

Model Capabilities

Mathematical Problem Solving

Logical Reasoning

Text Generation

Use Cases

Education

Math Competition Problem Solving

Solving AMC/AIME and other math competition problems

AMC23 accuracy 80%, AIME24 accuracy 46.7%

Research

Small Model Optimization Research

Exploring model optimization methods under resource constraints

Validating the effectiveness of RL methods for small models

🚀 Open RS Model

This repository provides a model for the Open RS project. The project aims to enhance the reasoning capabilities of small large language models (LLMs) using reinforcement learning under resource-constrained conditions, as presented in the paper Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t.

✨ Features

Enhanced Reasoning: Significantly improves the reasoning ability of small LLMs. For example, the AMC23 accuracy increases from 63% to 80%, and the AIME24 reaches 46.7%, outperforming o1-preview.
Cost-Effective Training: Achieves efficient training with only 7,000 samples at a cost of $42, far less than the thousands of dollars required by baseline models.
Open Source: All code, models, and datasets are open-sourced to support further research.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Summary

This repository hosts the model for the Open RS project, which accompanies the paper Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t. The project explores enhancing the reasoning capabilities of small large language models (LLMs) using reinforcement learning (RL) under resource-constrained conditions.

We focus on a 1.5-billion-parameter model, DeepSeek-R1-Distill-Qwen-1.5B, trained on 4 NVIDIA A40 GPUs (48 GB VRAM each) within 24 hours. By adapting the Group Relative Policy Optimization (GRPO) algorithm and leveraging a curated, compact mathematical reasoning dataset, we conducted three experiments to assess performance and behavior. Key findings include:

Significant reasoning improvements, e.g., AMC23 accuracy rising from 63% to 80% and AIME24 reaching 46.7%, outperforming o1-preview.
Efficient training with just 7,000 samples at a cost of $42, compared to thousands of dollars for baseline models.
Challenges like optimization instability and length constraints with extended training.

These results showcase RL-based fine-tuning as a cost-effective approach for small LLMs, making reasoning capabilities accessible in resource-limited settings. We open-source our code, models, and datasets to support further research.

For more details, please refer to our github.

Evaluation

Performance Highlights

Open-RS1: 53.0% avg. score
Open-RS2: 55.7% avg. score, 80.0% on AMC23
Open-RS3: 56.3% avg. score, 46.7% on AIME24 (outperforms o1-preview at 44.6%)
Competitive MATH-500 scores; Minerva lags behind 7B models.

Performance Metrics

Cost Efficiency

Our approach uses 7,000 samples (42,000 total outputs) and costs ~$42 on 4x A40 GPUs in 24 hours, compared to:

7B models: Qwen2.5-7B-SimpleRL ($1,633), Eurus-2-7B-PRIME ($1,088)
1.5B models: DeepScaleR-1.5B-Preview ($3,629), Still-3-1.5B-Preview ($2,268)

7B Model Costs
1.5B Model Costs

Information Table

Property	Details
Model Type	Model for the Open RS project, based on `DeepSeek-R1-Distill-Qwen-1.5B`
Training Data	`knoveleng/open-rs`, `knoveleng/open-s1`, `knoveleng/open-deepscaler`
Base Model	`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`
License	MIT

🔧 Technical Details

No specific technical implementation details are provided in the original document, so this section is skipped.

📄 License

This project is licensed under the MIT license.

📚 Citation

If this project aids your work, please cite it as:

@misc{dang2025reinforcementlearningreasoningsmall,
      title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't}, 
      author={Quy-Anh Dang and Chris Ngo},
      year={2025},
      eprint={2503.16219},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.16219}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご