DeepSeek-R1-Distill-Qwen-14B-GRPO-Taiwan-Spiritオープンソース文本生成モデル

ホーム

Deepseek R1 Distill Qwen 14B GRPO Taiwan Spirit

kartdによって開発

これはQwen-14Bモデルをベースに微調整されたバージョンで、GRPO方法を用いて訓練され、テキスト生成タスクに適しています。

大規模言語モデル

Transformers

#強化学習微調整 #テキスト生成最適化 #GRPO訓練

ダウンロード数 111

リリース時間 : 6/4/2025

モデル概要

このモデルは特定のモデルをベースに微調整されたバージョンで、TRLを用いて訓練され、主にテキスト生成タスクに使用されます。

モデル特徴

GRPO訓練方法

GRPO方法を用いて訓練され、この方法はDeepSeekMath論文で提案され、数学的推論能力を最適化します。

Qwen-14Bをベースに微調整

Qwen-14Bモデルをベースに微調整され、その強力なテキスト生成能力を引き継いでいます。

TRL訓練フレームワーク

TRL（Transformer Reinforcement Learning）フレームワークを用いて訓練され、モデルの生成結果を最適化します。

モデル能力

テキスト生成

数学的推論

使用事例

テキスト生成

時間旅行の選択

時間旅行の選択に関するテキスト回答を生成する

首尾一貫した論理的なテキスト回答を生成する

数学的推論

数学問題の解答

複雑な数学問題を解答する

正確な数学的推論と解答を生成する

🚀 DeepSeek-R1-Distill-Qwen-14B-GRPO-Taiwan-Spirit

このモデルは、自然言語処理タスクに特化した強力なモデルです。TRLを用いて微調整され、特定のタスクやデータセットに最適化されています。

🚀 クイックスタート

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="kartd/DeepSeek-R1-Distill-Qwen-14B-GRPO", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

📚 ドキュメント

トレーニング手順

このモデルは、DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models で紹介されたGRPOという手法を用いてトレーニングされました。

フレームワークのバージョン

Property	Details
TRL	0.18.0.dev0
Transformers	4.52.0.dev0
Pytorch	2.6.0
Datasets	3.6.0
Tokenizers	0.21.1

📄 ライセンス

このモデルは、指定されたライセンス条項の下で提供されています。詳細については、license ファイルを参照してください。

📚 引用

GRPOを次のように引用してください。

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

TRLを次のように引用してください。

@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}