Qwen2.5-0.5B-Instruct-Gensyn-Swarm開源對話模型 - 基於微調訓練優化對話體驗

首頁

Qwen2.5 0.5B Instruct Gensyn Swarm Fierce Placid Whale

由gangchen開發

基於Gensyn/Qwen2.5-0.5B-Instruct微調的版本，採用TRL框架和GRPO算法訓練

大型語言模型

Transformers

#強化學習微調 #GRPO算法優化 #小參數指令模型

下載量 3,053

發布時間 : 4/2/2025

模型概述

一個經過強化學習群體訓練的指令微調語言模型，專注於文本生成任務

模型特點

GRPO算法訓練

採用源自DeepSeekMath論文的GRPO方法進行訓練

TRL框架

使用Hugging Face的Transformer強化學習框架進行訓練

強化學習群體

通過群體訓練方式優化模型性能

模型能力

文本生成

指令理解

對話生成

使用案例

創意寫作

時光機選擇場景

生成關於時間旅行選擇的創意回答

可產生富有想象力的文本輸出

對話系統

開放域對話

用於構建開放域對話系統

能夠理解指令並生成連貫回覆

🚀 Qwen2.5-0.5B-Instruct-Gensyn-Swarm-fierce_placid_whale

本模型是基於Transformer架構的微調語言模型，在問答、文本生成等自然語言處理任務中表現出色。它基於預訓練模型進一步優化，能更精準地理解和生成文本。

🚀 快速開始

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="gangchen/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-fierce_placid_whale", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

✨ 主要特性

本模型是 Gensyn/Qwen2.5-0.5B-Instruct 的微調版本。
使用 TRL 進行訓練。

🔧 技術細節

訓練方法

本模型使用GRPO方法進行訓練，該方法在論文 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 中被提出。

框架版本

TRL: 0.15.2
Transformers: 4.51.3
Pytorch: 2.5.1
Datasets: 3.5.0
Tokenizers: 0.21.1

📄 許可證

本模型遵循 license 許可證。

📚 詳細文檔

模型信息

屬性	詳情
基礎模型	Gensyn/Qwen2.5-0.5B-Instruct
庫名稱	transformers
模型名稱	Qwen2.5-0.5B-Instruct-Gensyn-Swarm-fierce_placid_whale
標籤	generated_from_trainer, rl-swarm, grpo, gensyn, I am fierce placid whale, trl

引用信息

引用GRPO

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

引用TRL

@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}