O

Openrs GRPO

Developed by knoveleng
Open RS is a small-scale language model project optimized through reinforcement learning, focusing on enhancing the mathematical reasoning capabilities of a 1.5B-parameter model while achieving efficient training under resource constraints.
Downloads 30
Release Time : 3/18/2025

Model Overview

This project explores improving the reasoning abilities of small-scale language models via reinforcement learning (RL), employing the Group Relative Policy Optimization (GRPO) algorithm and training with a curated mathematical reasoning dataset.

Model Features

Efficient Reinforcement Learning Training
Training completed within 24 hours using only 7,000 samples at a cost of $42.
Significant Reasoning Improvement
AMC23 accuracy increased from 63% to 80%, while AIME24 achieved 46.7%, surpassing baseline models.
Resource-Friendly Optimization
Training can be completed with just 4 NVIDIA A40 GPUs (each with 48GB VRAM).

Model Capabilities

Mathematical Problem Solving
Logical Reasoning
Text Generation

Use Cases

Education
Math Competition Problem Solving
Solving AMC/AIME and other math competition problems
AMC23 accuracy 80%, AIME24 accuracy 46.7%
Research
Small Model Optimization Research
Exploring model optimization methods under resource constraints
Validating the effectiveness of RL methods for small models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase