Openrs3 GRPO Ja
O
Openrs3 GRPO Ja
Developed by EQUES
OpenRS3-GRPO-ja is a fine-tuned version of the SakanaAI/TinySwallow-1.5B-Instruct model on a Japanese mathematical instruction dataset, trained using the GRPO method, focusing on mathematical reasoning tasks.
Downloads 25
Release Time : 4/4/2025
Model Overview
This model is a Japanese language model specifically optimized for mathematical reasoning tasks, suitable for generating responses to mathematical instructions.
Model Features
GRPO Training Method
Trained using the GRPO method proposed in the DeepSeekMath paper to optimize mathematical reasoning capabilities.
Japanese Mathematical Instruction Optimization
Fine-tuned on the OpenMathInstruct-1-1.8m-ja Japanese mathematical instruction dataset, excelling in handling Japanese mathematical problems.
TRL Framework Training
Trained using the TRL (Transformer-based Reinforcement Learning) framework, completing a total of 300 training steps.
Model Capabilities
Japanese text generation
Mathematical problem solving
Instruction understanding and response
Use Cases
Education
Mathematical Problem Solving
Helps students understand and solve mathematical problems
Generates detailed problem-solving steps and explanations
Research
Mathematical Reasoning Research
Used for research and evaluation of mathematical reasoning abilities
Featured Recommended AI Models