Qwen3-8B-grpo-medmcqa開源醫學AI模型 - 免費部署精準回答醫學選擇題

首頁

Qwen3 8B Grpo Medmcqa

由mlxha開發

基於Qwen/Qwen3-8B在medmcqa-grpo數據集上微調的版本，專注於醫學選擇題回答任務

大型語言模型

Transformers

#醫療問答推理 #GRPO優化 #TRL微調

下載量 84

發布時間 : 5/8/2025

模型概述

該模型是基於Qwen/Qwen3-8B在medmcqa-grpo數據集上使用TRL和GRPO方法微調的版本，主要用於醫學領域的選擇題回答任務

模型特點

GRPO訓練方法

採用GRPO(Generalized Reinforcement Policy Optimization)方法訓練，該方法首次發表於DeepSeekMath論文

醫學領域優化

在medmcqa-grpo醫學選擇題數據集上微調，針對醫學領域問題有更好的表現

TRL框架訓練

使用TRL(Transformer Reinforcement Learning)框架進行訓練

模型能力

醫學選擇題回答

文本生成

醫學知識推理

使用案例

醫學教育

醫學考試輔助

幫助醫學生準備醫學考試中的選擇題部分

醫學知識問答

回答醫學相關選擇題，提供解釋和推理過程

🚀 Qwen3-8B-grpo-medmcqa

本項目基於預訓練模型Qwen3-8B，在醫學問答數據集medmcqa上進行微調，使用了GRPO方法和TRL庫進行訓練，可用於醫學領域的問答任務。

🚀 快速開始

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="mlxha/Qwen3-8B-grpo-medmcqa", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

✨ 主要特性

基於Qwen3-8B模型進行微調，在醫學問答數據集medmcqa上訓練，適用於醫學領域問答。
使用GRPO方法和TRL庫進行訓練。

📦 安裝指南

文檔未提及安裝步驟，暫不提供。

💻 使用示例

基礎用法

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="mlxha/Qwen3-8B-grpo-medmcqa", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

高級用法

文檔未提及高級用法代碼示例，暫不提供。

📚 詳細文檔

訓練過程

本模型使用GRPO方法進行訓練，該方法在論文DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models中被提出。

框架版本

TRL: 0.18.0.dev0
Transformers: 4.52.0.dev0
Pytorch: 2.6.0
Datasets: 3.6.0
Tokenizers: 0.21.1

🔧 技術細節

文檔未提供詳細技術細節，暫不提供。

📄 許可證

此項目遵循指定的許可協議，具體請查看license文件。

📚 引用信息

GRPO引用

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

TRL引用

@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}