base-7b-v0.2開源醫療語言模型 - 專業醫生訓練，助力醫療問答超及格線！

首頁

Base 7b V0.2

由internistai開發

由醫學醫生訓練的大型醫療領域語言模型，首個在MedQA（USMLE）考試中得分超過60%及格線的7b模型

大型語言模型

Transformers

英語開源協議:Apache-2.0 #醫療問答 #USMLE考試 #臨床決策支持

下載量 91

發布時間 : 4/21/2024

模型概述

專為醫療領域設計的語言模型，結合高質量醫學文獻與通用數據，保持跨領域能力，主要用於臨床決策支持和文檔輔助

模型特點

醫生參與訓練

訓練數據由醫學醫生精心篩選，確保臨床相關性和質量

醫療領域優化

在MedQA（USMLE）考試中得分超過60%及格線，優於同類7b模型

跨領域能力

結合通用數據和醫療專業數據，保持多領域應用能力

長上下文支持

支持4096個token的上下文長度，適合處理複雜醫療文檔

模型能力

醫療文本生成

臨床決策支持

醫學問答

醫療文檔輔助

使用案例

臨床支持

疾病特徵描述

生成特定疾病的解剖學特徵和臨床表現描述

在醫生評估中達到與GPT-4相當的描述質量

診斷輔助

基於症狀提供可能的診斷建議

在MedQA測試中達到60.5%準確率

醫學教育

USMLE考試準備

幫助醫學生準備USMLE考試相關問題

在MedQA測試中超過及格線

🚀 Internist.ai 7b 模型卡片

Internist.ai 7b 是一款醫學領域的大語言模型，由醫學專家參與訓練，展示了 “醫生參與” 方法的優勢。訓練數據由醫學專家精心挑選，以確保其臨床相關性和臨床實踐所需的質量。

這款 70 億參數的模型是首個在 MedQA（USMLE）測試中得分超過 60% 及格線的 70 億參數模型，並且在大多數醫學評估中表現優於同規模的其他模型。

該模型是一個概念驗證，後續計劃基於更大的醫學文獻語料庫訓練更大的模型。如果您想贊助計算資源以加快訓練速度，請隨時與我們聯繫。

諮詢通知

該模型由醫學專家為醫學專家設計，在非醫學專業人員使用時，未針對潛在的安全問題進行特定訓練。我們強烈建議在沒有通過前瞻性臨床試驗進行全面評估並進行額外訓練以達到所需安全水平的情況下，不要在實際環境中使用該模型。

🚀 快速開始

本部分暫未提供快速開始的相關內容。

✨ 主要特性

醫生參與訓練：由醫學專家精心挑選訓練數據，確保臨床相關性和質量。
優異的醫學評估表現：是首個在 MedQA（USMLE）測試中得分超過 60% 及格線的 70 億參數模型，在大多數醫學評估中優於同規模其他模型。

📦 安裝指南

本部分暫未提供安裝指南的相關內容。

💻 使用示例

基礎用法

該模型使用 Alpaca 格式，以下是一個聊天模板的使用示例：

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("internistai/base-7b-v0.2")
tokenizer = AutoTokenizer.from_pretrained("internistai/base-7b-v0.2")

messages = [
    {"role": "user", "content": "Describe the anatomy of nutcracker syndrome"},
]

encodeds = tokenizer.apply_chat_template(messages, add_generation_prompt=True ,return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

高級用法

本部分暫未提供高級用法的相關內容。

📚 詳細文檔

模型詳情

屬性	詳情
開發方	UCLouvain 和 Cliniques Universitaires Saint-Luc
語言	主要為英語
模型許可證	APACHE 2.0 LICENSE
代碼許可證	APACHE 2.0 LICENSE
基礎模型	Mistral-7B-v0.1
上下文長度	4096 個詞元
知識截止日期	2023 年 10 月

模型來源

訓練器：Axolotl
論文：Impact of High-Quality, Mixed-Domain Data on the Performance of Medical Language Models

用途

該模型旨在展示使用高質量且相關的醫學文獻以及通用數據來保留其他領域能力的好處。因此，該模型並非針對特定用途進行訓練，也未進行額外的指令微調以確保安全性。

當前狀態下，該模型可作為醫學專業人員的助手，用於臨床決策支持或文檔撰寫。我們不建議非專業人員使用該模型，因為他們可能無法察覺模型輸出中的錯誤。

我們建議在實際應用中使用該模型之前，進行特定任務的訓練和安全評估。

適用範圍外的使用

我們不建議在生產環境中使用該模型進行自然語言生成，無論是否進行微調。

專業評估

我們創建了一個包含 100 個問題的自由回答評估數據集，並使用這些問題對該模型和 GPT - 4 進行了測試。然後，我們收集了提示/答案對，並將其呈現給 10 位不同專業的醫學專家，讓他們使用 7 點李克特量表進行評價（更多信息請參閱論文）。

🔧 技術細節

訓練細節

訓練數據

Internist.ai 7b 總共包含 23 億個詞元：

通用領域：OpenOrca - GPT4 是一個最先進的通用領域數據集，使用 GPT - 4 從 Flan 提示生成。
醫學指南：包含來自 UpToDate 的 11332 篇文章，以及醫生提供的特定領域指南，以覆蓋 [USMLE 內容大綱](https://www.usmle.org/sites/default/files/2021 - 08/USMLE_Content_Outline.pdf)。
醫學書籍：從 PMC LitArch 和我們的大學圖書館獲取了 10376 本教科書。
合成數據：我們通過使用指令提示一個更大的模型，對醫學指南中的摘錄進行轉換和調整，生成了 4 億個詞元。

數據可用性：由於數據集包含專有信息，我們不會公開發布這些數據集。關於合成數據集，正如我們在論文中所示，僅在該數據集上訓練的模型表現非常差，未達到我們的標準。由於質量不佳，我們決定不發佈該數據集。

訓練過程

我們使用 Axolotl 在配備 4 塊 NVIDIA A100 80GB GPU 的服務器上進行了總共 450 個 GPU 小時的訓練。我們使用了 FlashAttention、NEFTune 和樣本打包技術，並使用了以下參數：

參數	值
bf16	true
lr	6e - 6
eps	1e - 5
epochs	4
betas	[0.9, 0.95]
weight decay	0.1
批次大小	192000 個詞元
序列長度	4096
學習率調度器	cosine
最小學習率	1e - 8
NEFT alpha	5
熱身迭代次數	100

評估

測試數據與指標

測試數據

指標

準確率：我們使用 [lm - evaluation - harness](https://github.com/maximegmd/lm - evaluation - harness/tree/big - refactor/lm_eval) 進行了標準化的零樣本基準測試。

結果

	Internist.ai 7b	PMC LLaMA 7b*	Mistral 7b	Meditron 7b**
MedQA	60.5	27.7 (44.7)	48.7	52.0
MedMCQA	55.8	32.2 (51.4)	45.7	59.2
PubMedQA	79.4	67.8 (74.6)	75.8	74.4
MMLU 專業醫學	76.1	19.5	65.8	26.6
MMLU 臨床知識	70.6	23.8	61.1	35.5
MMLU 解剖學	65.9	18.5	52.6	42.6
MMLU 大學醫學	63.0	23.7	55.5	28.9
MMLU 醫學遺傳學	71.0	32.0	68.0	46.0

*: PMC LLaMA 7b 在基準測試中表現不佳，可能是由於格式不匹配和缺乏指令微調。括號內為作者報告的可用結果。

**: Meditron 7b 在 MMLU 中的結果是為了透明起見而報告的，但與他們論文中報告的 54.2 的平均值不一致。請隨時告知每個類別的詳細信息，以便我們更新表格。

📄 許可證

本模型使用 APACHE 2.0 LICENSE 許可證。

📚 引用

BibTeX: 如果您使用 Internist.ai 7b，請引用我們：

@article{10.1093/jamia/ocae120,
    author = {Griot, Maxime and Hemptinne, Coralie and Vanderdonckt, Jean and Yuksel, Demet},
    title = "{Impact of high-quality, mixed-domain data on the performance of medical language models}",
    journal = {Journal of the American Medical Informatics Association},
    volume = {31},
    number = {9},
    pages = {1875-1883},
    year = {2024},
    month = {05},
    abstract = "{To optimize the training strategy of large language models for medical applications, focusing on creating clinically relevant systems that efficiently integrate into healthcare settings, while ensuring high standards of accuracy and reliability.We curated a comprehensive collection of high-quality, domain-specific data and used it to train several models, each with different subsets of this data. These models were rigorously evaluated against standard medical benchmarks, such as the USMLE, to measure their performance. Furthermore, for a thorough effectiveness assessment, they were compared with other state-of-the-art medical models of comparable size.The models trained with a mix of high-quality, domain-specific, and general data showed superior performance over those trained on larger, less clinically relevant datasets (P \\&lt; .001). Our 7-billion-parameter model Med5 scores 60.5\\% on MedQA, outperforming the previous best of 49.3\\% from comparable models, and becomes the first of its size to achieve a passing score on the USMLE. Additionally, this model retained its proficiency in general domain tasks, comparable to state-of-the-art general domain models of similar size.Our findings underscore the importance of integrating high-quality, domain-specific data in training large language models for medical purposes. The balanced approach between specialized and general data significantly enhances the model’s clinical relevance and performance.This study sets a new standard in medical language models, proving that a strategically trained, smaller model can outperform larger ones in clinical relevance and general proficiency, highlighting the importance of data quality and expert curation in generative artificial intelligence for healthcare applications.}",
    issn = {1527-974X},
    doi = {10.1093/jamia/ocae120},
    url = {https://doi.org/10.1093/jamia/ocae120},
    eprint = {https://academic.oup.com/jamia/article-pdf/31/9/1875/58868289/ocae120.pdf},
}