GENIE_en_8b开源模型 - 免费从电子健康记录提取生物医学实体及属性

首页

GENIE En 8b

由 THUMedInfo 开发

GENIE是一个端到端模型，专门用于从电子健康记录（EHR）中结构化自由文本，提取生物医学命名实体及其相关属性。

大型语言模型

Safetensors

英语开源协议:Apache-2.0 #电子健康记录结构化 #医疗信息提取 #无提示工程

下载量 141

发布时间 : 11/19/2024

模型简介

GENIE通过单次处理EHR，提取生物医学命名实体及其断言状态、身体部位、修饰语、数值、单位和预期目的，并以结构化JSON格式输出这些信息。

模型特点

端到端处理

通过单一模型处理EHR，简化传统自然语言处理工作流程，无需多个分析组件。

结构化输出

直接生成结构化JSON格式的输出，包含生物医学命名实体及其相关属性。

高效处理

一次性生成所有相关属性，显著减少运行时间和操作成本。

无需提示工程

与通用LLM不同，GENIE不需要提示工程或少样本示例。

模型能力

电子健康记录结构化

生物医学命名实体提取

断言状态识别

身体部位定位

修饰语提取

数值和单位提取

预期目的识别

使用案例

医疗信息管理

EHR结构化处理

从电子健康记录中提取结构化信息，用于医疗数据分析和存储。

输出包含生物医学命名实体及其属性的结构化JSON数据。

临床研究

患者病历分析

自动分析患者病历，提取关键医疗信息用于研究。

快速识别疾病、症状、药物等关键信息。

🚀 GENIE模型卡片

GENIE（Generative Note Information Extraction）是一个端到端的模型，专为对电子健康记录（EHR）中的自由文本进行结构化处理而设计。它能够一次性处理EHR，提取生物医学命名实体及其断言状态、身体位置、修饰符、值、单位和预期用途，并以结构化的JSON格式输出信息。

模型详情

属性	详情
模型大小	80亿参数（英文）
最大令牌数	8192
基础模型	Llama 3.1 8B（英文）
模型类型	特征提取
标签	结构化、电子健康记录、医学、信息提取
许可证	Apache-2.0

模型描述

GENIE（生成式病历信息提取）是一个端到端的模型，旨在对电子健康记录（EHR）中的自由文本进行结构化处理。它能够一次性处理EHR，提取生物医学命名实体及其断言状态、身体位置、修饰符、值、单位和预期用途，并以结构化的JSON格式输出这些信息。这种简化的方法通过用单个模型取代所有分析组件，简化了传统的自然语言处理工作流程，使系统更易于维护，同时利用了大语言模型（LLM）的高级分析能力。与通用LLM相比，GENIE不需要提示工程或小样本示例。此外，它可以一次性生成所有相关属性，显著降低了运行时间和运营成本。 GENIE由盛宇（https://www.stat.tsinghua.edu.cn/teachers/shengyu/）、蔡天西（https://dbmi.hms.harvard.edu/people/tianxi-cai）和艾萨克·科哈内（https://dbmi.hms.harvard.edu/people/isaac-kohane）的团队联合开发。

🚀 快速开始

💻 使用示例

基础用法

from vllm import LLM, SamplingParams

model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1)
#model = LLM(model=path/to/your/local/model, tensor_parallel_size=1)

PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:"
sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
EHR = ['xxxxx1','xxxxx2']
texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
output = model.generate(texts, sampling_params)
res = json.loads(output[0].outputs[0].text)

示例输入输出

输入：

EHR = ["""Unit No:___

Admission Date:___

Discharge Date:___

Date of Birth:___

Sex:   F

Service: MEDICINE

Allergies:
Sulfur / Norvasc

Attending:___
Addendum:
See below

Chief Complaint:
abdominal pain

Major Surgical or Invasive Procedure:
none

History of Present Illness:
84 F with PMHx of Renovascular HTN c/b NSTEMI now s/p renal
stents, Gout and h/o Crohn's disease who presented to the ED on
___with RLQ pain for approx 2 days.  She denies any
nausea/vomiting/diarrhea or constipation but has not been taking
po well and felt dehydrated."""]

输出：

res = [{'phrase': 'allergies',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'title',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'sulfur',
  'semantic_type': 'Chemical or Drug',
  'assertion_status': 'conditional',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'null',
  'unit': 'units: null',
  'purpose': 'null'},
 {'phrase': 'norvasc',
  'semantic_type': 'Chemical or Drug',
  'assertion_status': 'conditional',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'null',
  'unit': 'units: null',
  'purpose': 'null'},
 {'phrase': 'abdominal pain',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'present',
  'body_location': 'Abdominal',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'surgical or invasive procedure',
  'semantic_type': 'Therapeutic or Preventive Procedure',
  'assertion_status': 'title',
  'body_location': 'null',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'null'},
 {'phrase': 'renovascular hypertension',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'renal',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'non-st elevation myocardial infarction',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'gout',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': "crohn ' s disease",
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'emergency department',
  'semantic_type': 'Therapeutic or Preventive Procedure',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'null'},
 {'phrase': 'pain',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'present',
  'body_location': 'right lower quadrant',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'nausea',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'vomiting',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'diarrhea',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'constipation',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'}]

📄 引用

如果您认为我们的论文或模型有帮助，请考虑引用：

@misc{ying2025geniegenerativenoteinformation,
      title={GENIE: Generative Note Information Extraction model for structuring EHR data}, 
      author={Huaiyuan Ying and Hongyi Yuan and Jinsen Lu and Zitian Qu and Yang Zhao and Zhengyun Zhao and Isaac Kohane and Tianxi Cai and Sheng Yu},
      year={2025},
      eprint={2501.18435},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.18435}, 
}