GENIE_en_8b開源模型 - 免費從電子健康記錄提取生物醫學實體及屬性

首頁

GENIE En 8b

由THUMedInfo開發

GENIE是一個端到端模型，專門用於從電子健康記錄（EHR）中結構化自由文本，提取生物醫學命名實體及其相關屬性。

大型語言模型

Safetensors

英語開源協議:Apache-2.0 #電子健康記錄結構化 #醫療信息提取 #無提示工程

下載量 141

發布時間 : 11/19/2024

模型概述

GENIE通過單次處理EHR，提取生物醫學命名實體及其斷言狀態、身體部位、修飾語、數值、單位和預期目的，並以結構化JSON格式輸出這些信息。

模型特點

端到端處理

通過單一模型處理EHR，簡化傳統自然語言處理工作流程，無需多個分析組件。

結構化輸出

直接生成結構化JSON格式的輸出，包含生物醫學命名實體及其相關屬性。

高效處理

一次性生成所有相關屬性，顯著減少運行時間和操作成本。

無需提示工程

與通用LLM不同，GENIE不需要提示工程或少樣本示例。

模型能力

電子健康記錄結構化

生物醫學命名實體提取

斷言狀態識別

身體部位定位

修飾語提取

數值和單位提取

預期目的識別

使用案例

醫療信息管理

EHR結構化處理

從電子健康記錄中提取結構化信息，用於醫療數據分析和存儲。

輸出包含生物醫學命名實體及其屬性的結構化JSON數據。

臨床研究

患者病歷分析

自動分析患者病歷，提取關鍵醫療信息用於研究。

快速識別疾病、症狀、藥物等關鍵信息。

🚀 GENIE模型卡片

GENIE（Generative Note Information Extraction）是一個端到端的模型，專為對電子健康記錄（EHR）中的自由文本進行結構化處理而設計。它能夠一次性處理EHR，提取生物醫學命名實體及其斷言狀態、身體位置、修飾符、值、單位和預期用途，並以結構化的JSON格式輸出信息。

模型詳情

屬性	詳情
模型大小	80億參數（英文）
最大令牌數	8192
基礎模型	Llama 3.1 8B（英文）
模型類型	特徵提取
標籤	結構化、電子健康記錄、醫學、信息提取
許可證	Apache-2.0

模型描述

GENIE（生成式病歷信息提取）是一個端到端的模型，旨在對電子健康記錄（EHR）中的自由文本進行結構化處理。它能夠一次性處理EHR，提取生物醫學命名實體及其斷言狀態、身體位置、修飾符、值、單位和預期用途，並以結構化的JSON格式輸出這些信息。這種簡化的方法通過用單個模型取代所有分析組件，簡化了傳統的自然語言處理工作流程，使系統更易於維護，同時利用了大語言模型（LLM）的高級分析能力。與通用LLM相比，GENIE不需要提示工程或小樣本示例。此外，它可以一次性生成所有相關屬性，顯著降低了運行時間和運營成本。 GENIE由盛宇（https://www.stat.tsinghua.edu.cn/teachers/shengyu/）、蔡天西（https://dbmi.hms.harvard.edu/people/tianxi-cai）和艾薩克·科哈內（https://dbmi.hms.harvard.edu/people/isaac-kohane）的團隊聯合開發。

🚀 快速開始

💻 使用示例

基礎用法

from vllm import LLM, SamplingParams

model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1)
#model = LLM(model=path/to/your/local/model, tensor_parallel_size=1)

PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:"
sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
EHR = ['xxxxx1','xxxxx2']
texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
output = model.generate(texts, sampling_params)
res = json.loads(output[0].outputs[0].text)

示例輸入輸出

輸入：

EHR = ["""Unit No:___

Admission Date:___

Discharge Date:___

Date of Birth:___

Sex:   F

Service: MEDICINE

Allergies:
Sulfur / Norvasc

Attending:___
Addendum:
See below

Chief Complaint:
abdominal pain

Major Surgical or Invasive Procedure:
none

History of Present Illness:
84 F with PMHx of Renovascular HTN c/b NSTEMI now s/p renal
stents, Gout and h/o Crohn's disease who presented to the ED on
___with RLQ pain for approx 2 days.  She denies any
nausea/vomiting/diarrhea or constipation but has not been taking
po well and felt dehydrated."""]

輸出：

res = [{'phrase': 'allergies',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'title',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'sulfur',
  'semantic_type': 'Chemical or Drug',
  'assertion_status': 'conditional',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'null',
  'unit': 'units: null',
  'purpose': 'null'},
 {'phrase': 'norvasc',
  'semantic_type': 'Chemical or Drug',
  'assertion_status': 'conditional',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'null',
  'unit': 'units: null',
  'purpose': 'null'},
 {'phrase': 'abdominal pain',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'present',
  'body_location': 'Abdominal',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'surgical or invasive procedure',
  'semantic_type': 'Therapeutic or Preventive Procedure',
  'assertion_status': 'title',
  'body_location': 'null',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'null'},
 {'phrase': 'renovascular hypertension',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'renal',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'non-st elevation myocardial infarction',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'gout',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': "crohn ' s disease",
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'emergency department',
  'semantic_type': 'Therapeutic or Preventive Procedure',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'null'},
 {'phrase': 'pain',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'present',
  'body_location': 'right lower quadrant',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'nausea',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'vomiting',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'diarrhea',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'constipation',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'}]

📄 引用

如果您認為我們的論文或模型有幫助，請考慮引用：

@misc{ying2025geniegenerativenoteinformation,
      title={GENIE: Generative Note Information Extraction model for structuring EHR data}, 
      author={Huaiyuan Ying and Hongyi Yuan and Jinsen Lu and Zitian Qu and Yang Zhao and Zhengyun Zhao and Isaac Kohane and Tianxi Cai and Sheng Yu},
      year={2025},
      eprint={2501.18435},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.18435}, 
}