OpenRS3-GRPO-ja Open-source AI Model - Free Deployment to Boost Japanese Mathematical Reasoning Tasks

Openrs3 GRPO Ja

Developed by EQUES

OpenRS3-GRPO-ja is a fine-tuned version of the SakanaAI/TinySwallow-1.5B-Instruct model on a Japanese mathematical instruction dataset, trained using the GRPO method, focusing on mathematical reasoning tasks.

Large Language Model

Transformers

#Mathematical Reasoning Optimization #Japanese Instruction Fine-tuning #GRPO Training

Downloads 25

Release Time : 4/4/2025

Model Overview

This model is a Japanese language model specifically optimized for mathematical reasoning tasks, suitable for generating responses to mathematical instructions.

Model Features

GRPO Training Method

Trained using the GRPO method proposed in the DeepSeekMath paper to optimize mathematical reasoning capabilities.

Japanese Mathematical Instruction Optimization

Fine-tuned on the OpenMathInstruct-1-1.8m-ja Japanese mathematical instruction dataset, excelling in handling Japanese mathematical problems.

TRL Framework Training

Trained using the TRL (Transformer-based Reinforcement Learning) framework, completing a total of 300 training steps.

Model Capabilities

Japanese text generation

Mathematical problem solving

Instruction understanding and response

Use Cases

Education

Mathematical Problem Solving

Helps students understand and solve mathematical problems

Generates detailed problem-solving steps and explanations

Research

Mathematical Reasoning Research

Used for research and evaluation of mathematical reasoning abilities

🚀 OpenRS3-GRPO-ja

OpenRS3-GRPO-ja is a fine - tuned language model, leveraging the power of GRPO to enhance mathematical reasoning capabilities on specific datasets.

🚀 Quick Start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="stardust-eques/OpenRS-GRPO-ja", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

✨ Features

This model is a fine - tuned version of SakanaAI/TinySwallow-1.5B-Instruct on the kunishou/OpenMathInstruct-1-1.8m-ja dataset.
It has been trained using TRL, with 300 training steps.
Trained with GRPO, a method from DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

🔧 Technical Details

Framework versions

Property	Details
TRL	0.16.0.dev0
Transformers	4.49.0
Pytorch	2.5.1
Datasets	3.5.0
Tokenizers	0.21.1

📚 Documentation

Training procedure

This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

📄 License

The model is released under the license license.

📄 Citations

Cite GRPO as:

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

Cite TRL as:

@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご