OpenRS3-GRPO-ja開源AI模型 - 免費部署助力日語數學推理任務

首頁

Openrs3 GRPO Ja

由EQUES開發

OpenRS3-GRPO-ja是基於SakanaAI/TinySwallow-1.5B-Instruct模型在日語數學指令數據集上微調的版本，採用GRPO方法訓練，專注於數學推理任務。

大型語言模型

Transformers

#數學推理優化 #日語指令微調 #GRPO訓練

下載量 25

發布時間 : 4/4/2025

模型概述

該模型是一個日語語言模型，專門針對數學推理任務進行了優化，適用於生成數學相關的指令響應。

模型特點

GRPO訓練方法

採用DeepSeekMath論文中提出的GRPO方法進行訓練，優化數學推理能力。

日語數學指令優化

在OpenMathInstruct-1-1.8m-ja日語數學指令數據集上微調，擅長處理日語數學問題。

TRL框架訓練

使用TRL(基於Transformer的強化學習)框架進行訓練，共進行了300步訓練。

模型能力

日語文本生成

數學問題解答

指令理解與響應

使用案例

教育

數學問題解答

幫助學生理解和解答數學問題

生成詳細的解題步驟和解釋

研究

數學推理研究

用於數學推理能力的研究和評估

🚀 OpenRS3 - GRPO - ja

OpenRS3 - GRPO - ja 是一個經過微調的模型，基於特定數據集對基礎模型進行優化，可用於文本生成等任務，為相關領域的應用提供了有力支持。

🚀 快速開始

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="stardust-eques/OpenRS-GRPO-ja", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

✨ 主要特性

該模型是 SakanaAI/TinySwallow - 1.5B - Instruct 在 kunishou/OpenMathInstruct - 1 - 1.8m - ja 數據集上的微調版本。
使用 TRL 進行訓練，訓練步數為 300。

📚 詳細文檔

訓練過程

本模型使用 GRPO 方法進行訓練，該方法在論文 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 中被提出。

框架版本

TRL: 0.16.0.dev0
Transformers: 4.49.0
Pytorch: 2.5.1
Datasets: 3.5.0
Tokenizers: 0.21.1

📄 許可證

本項目遵循指定的許可證（licence: license）。

📚 引用信息

引用 GRPO

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

引用 TRL

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

信息表格

屬性	詳情
基礎模型	SakanaAI/TinySwallow - 1.5B - Instruct
訓練數據集	kunishou/OpenMathInstruct - 1 - 1.8m - ja
庫名稱	transformers
模型名稱	OpenRS3 - GRPO - ja
標籤	generated_from_trainer、open - r1、trl、grpo