Open-RS1 Open-Source Small-Scale Large Language Model - Reinforcement Learning to Enhance Inference Capability of 1.5B Parameters

Open RS1

Developed by knoveleng

A small-scale large language model enhanced by reinforcement learning, focused on improving the reasoning capabilities of a 1.5B parameter model

Large Language Model

Transformers

Open Source License:MIT #Reinforcement Learning Inference Optimization #Low-Cost Efficient Training #Enhanced Mathematical Reasoning

Downloads 6,229

Release Time : 3/18/2025

Model Overview

This project explores enhancing the reasoning capabilities of small-scale large language models (LLMs) under resource-constrained conditions using reinforcement learning (RL). It employs the Group Relative Policy Optimization (GRPO) algorithm and is trained on a carefully selected compact mathematical reasoning dataset.

Model Features

Enhanced Efficient Reasoning

Significant improvement in reasoning capabilities through reinforcement learning fine-tuning, with AMC23 accuracy rising from 63% to 80% and AIME24 reaching 46.7%

Low-Cost Training

Requires only 7,000 samples, costing $42, and completes training within 24 hours on 4 NVIDIA A40 GPUs

Resource Optimization

Designed for resource-constrained environments, significantly reducing computational costs compared to 7B models

Model Capabilities

Mathematical Reasoning

Text Generation

Logical Reasoning

Use Cases

Education

Mathematical Problem Solving

Solving various mathematical reasoning problems

AMC23 accuracy reaches 80%

Research

Small LLM Capability Validation

Validating the application of reinforcement learning on small-scale models

AIME24 score of 46.7%, surpassing the o1-preview model

🚀 Open RS Project Model

This project explores enhancing reasoning capabilities in small large language models (LLMs) using reinforcement learning (RL) under resource - constrained conditions, offering a cost - effective approach.

🚀 Quick Start

For more details about the project, please refer to our github.

✨ Features

Reasoning Enhancement: Significantly improves the reasoning capabilities of small LLMs. For example, the AMC23 accuracy rises from 63% to 80% and AIME24 reaches 46.7%, outperforming o1 - preview.
Cost - Efficiency: Achieves efficient training with just 7,000 samples at a cost of $42, far lower than baseline models.
Open - Source: We open - source our code, models, and datasets to support further research.

📦 Installation

No installation steps are provided in the original README, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original README, so this section is skipped.

📚 Documentation

Model Summary

This repository hosts a model for the Open RS project, accompanying the paper Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t. We focus on a 1.5 - billion - parameter model, DeepSeek - R1 - Distill - Qwen - 1.5B, trained on 4 NVIDIA A40 GPUs (48 GB VRAM each) within 24 hours. By adapting the Group Relative Policy Optimization (GRPO) algorithm and leveraging a curated, compact mathematical reasoning dataset, we conducted three experiments to assess performance and behavior.

Evaluation

Performance Highlights

Open - RS1: 53.0% avg. score
Open - RS2: 55.7% avg. score, 80.0% on AMC23
Open - RS3: 56.3% avg. score, 46.7% on AIME24 (outperforms o1 - preview at 44.6%)
Competitive MATH - 500 scores; Minerva lags behind 7B models.

Performance Metrics

Cost Efficiency

Our approach uses 7,000 samples (42,000 total outputs) and costs ~$42 on 4x A40 GPUs in 24 hours, compared to:

7B models: Qwen2.5 - 7B - SimpleRL ($1,633), Eurus - 2 - 7B - PRIME ($1,088)
1.5B models: DeepScaleR - 1.5B - Preview ($3,629), Still - 3 - 1.5B - Preview ($2,268)

7B Model Costs
1.5B Model Costs

Model Information

Property	Details
Base Model	deepseek - ai/DeepSeek - R1 - Distill - Qwen - 1.5B
Datasets	knoveleng/open - rs, knoveleng/open - s1, knoveleng/open - deepscaler
Pipeline Tag	text - generation
Inference	true
Library Name	transformers

🔧 Technical Details

No detailed technical implementation information is provided in the original README, so this section is skipped.

📄 License

The project is licensed under the MIT license.

📚 Citation

If this project aids your work, please cite it as:

@misc{dang2025reinforcementlearningreasoningsmall,
      title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't}, 
      author={Quy - Anh Dang and Chris Ngo},
      year={2025},
      eprint={2503.16219},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.16219}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご