O

Openrs3 GRPO Ja

Developed by EQUES
OpenRS3-GRPO-ja is a fine-tuned version of the SakanaAI/TinySwallow-1.5B-Instruct model on a Japanese mathematical instruction dataset, trained using the GRPO method, focusing on mathematical reasoning tasks.
Downloads 25
Release Time : 4/4/2025

Model Overview

This model is a Japanese language model specifically optimized for mathematical reasoning tasks, suitable for generating responses to mathematical instructions.

Model Features

GRPO Training Method
Trained using the GRPO method proposed in the DeepSeekMath paper to optimize mathematical reasoning capabilities.
Japanese Mathematical Instruction Optimization
Fine-tuned on the OpenMathInstruct-1-1.8m-ja Japanese mathematical instruction dataset, excelling in handling Japanese mathematical problems.
TRL Framework Training
Trained using the TRL (Transformer-based Reinforcement Learning) framework, completing a total of 300 training steps.

Model Capabilities

Japanese text generation
Mathematical problem solving
Instruction understanding and response

Use Cases

Education
Mathematical Problem Solving
Helps students understand and solve mathematical problems
Generates detailed problem-solving steps and explanations
Research
Mathematical Reasoning Research
Used for research and evaluation of mathematical reasoning abilities
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase