GENIE_en_8b Open-source Model - Free Extraction of Biomedical Entities and Attributes from Electronic Health Records

GENIE En 8b

Developed by THUMedInfo

GENIE is an end-to-end model specifically designed for structuring free text from Electronic Health Records (EHR) to extract biomedical named entities and their related attributes.

Large Language Model

Safetensors

EnglishOpen Source License:Apache-2.0 #Electronic Health Record Structuring #Medical Information Extraction #Prompt-Free Engineering

Downloads 141

Release Time : 11/19/2024

Model Overview

GENIE processes EHR in a single pass to extract biomedical named entities along with their assertion status, body parts, modifiers, values, units, and intended purposes, outputting this information in a structured JSON format.

Model Features

End-to-End Processing

Simplifies traditional NLP workflows by processing EHR with a single model, eliminating the need for multiple analysis components.

Structured Output

Directly generates structured JSON output containing biomedical named entities and their related attributes.

Efficient Processing

Generates all relevant attributes in one pass, significantly reducing runtime and operational costs.

Prompt-Free Engineering

Unlike general-purpose LLMs, GENIE requires no prompt engineering or few-shot examples.

Model Capabilities

Electronic Health Record Structuring

Biomedical Named Entity Extraction

Assertion Status Recognition

Body Part Localization

Modifier Extraction

Value and Unit Extraction

Intended Purpose Recognition

Use Cases

Medical Information Management

EHR Structuring

Extracts structured information from electronic health records for medical data analysis and storage.

Outputs structured JSON data containing biomedical named entities and their attributes.

Clinical Research

Patient Record Analysis

Automatically analyzes patient records to extract key medical information for research purposes.

Rapidly identifies critical information such as diseases, symptoms, and medications.

🚀 GENIE Model Card

GENIE (Generative Note Information Extraction) is an end - to - end model that simplifies the structuring of free text from electronic health records (EHRs). It efficiently extracts key biomedical information and outputs it in a structured JSON format, reducing costs and streamlining workflows.

🚀 Quick Start

Prerequisites

Make sure you have the vllm library installed.

Code Example

from vllm import LLM, SamplingParams

model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1)
#model = LLM(model=path/to/your/local/model, tensor_parallel_size=1)

PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:"
sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
EHR = ['xxxxx1','xxxxx2']
texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
output = model.generate(texts, sampling_params)
res = json.loads(output[0].outputs[0].text)

✨ Features

End - to - End Structuring: Processes EHRs in a single pass, outputting structured JSON data.
Reduced Complexity: Replaces multiple analysis components with a single model, making the system easier to maintain.
Cost - Effective: Compares favorably with general - purpose LLMs, eliminating the need for prompt engineering and few - shot examples, and reducing runtime and operational costs.

📚 Documentation

Model Details

Property	Details
Model Size	8B (English)
Max Tokens	8192
Base model	Llama 3.1 8B (English)

Model Description

GENIE (Generative Note Information Extraction) is an end - to - end model designed to structure free text from electronic health records (EHRs). It processes EHRs in a single pass, extracting biomedical named entities along with their assertion statuses, body locations, modifiers, values, units, and intended purposes, outputting this information in a structured JSON format. This streamlined approach simplifies traditional natural language processing workflows by replacing all the analysis components with a single model, making the system easier to maintain while leveraging the advanced analytical capabilities of large language models (LLMs). Comparing with general - purpose LLMs, GENIE does not require prompt engineering or few - shot examples. Additionally, it generates all relevant attributes in one pass, significantly reducing both runtime and operational costs.

GENIE is co - developed by the groups of Sheng Yu (https://www.stat.tsinghua.edu.cn/teachers/shengyu/), Tianxi Cai (https://dbmi.hms.harvard.edu/people/tianxi - cai), and Isaac Kohane (https://dbmi.hms.harvard.edu/people/isaac - kohane).

💻 Usage Examples

Basic Usage

from vllm import LLM, SamplingParams

model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1)
#model = LLM(model=path/to/your/local/model, tensor_parallel_size=1)

PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:"
sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
EHR = ['xxxxx1','xxxxx2']
texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
output = model.generate(texts, sampling_params)
res = json.loads(output[0].outputs[0].text)

Example Input and Output

Input

EHR = ["""Unit No:___

Admission Date:___

Discharge Date:___

Date of Birth:___

Sex:   F

Service: MEDICINE

Allergies:
Sulfur / Norvasc

Attending:___
Addendum:
See below

Chief Complaint:
abdominal pain

Major Surgical or Invasive Procedure:
none

History of Present Illness:
84 F with PMHx of Renovascular HTN c/b NSTEMI now s/p renal
stents, Gout and h/o Crohn's disease who presented to the ED on
___with RLQ pain for approx 2 days.  She denies any
nausea/vomiting/diarrhea or constipation but has not been taking

po well and felt dehydrated."""]

Output

res = [{'phrase': 'allergies',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'title',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'sulfur',
  'semantic_type': 'Chemical or Drug',
  'assertion_status': 'conditional',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'null',
  'unit': 'units: null',
  'purpose': 'null'},
 {'phrase': 'norvasc',
  'semantic_type': 'Chemical or Drug',
  'assertion_status': 'conditional',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'null',
  'unit': 'units: null',
  'purpose': 'null'},
 {'phrase': 'abdominal pain',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'present',
  'body_location': 'Abdominal',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'surgical or invasive procedure',
  'semantic_type': 'Therapeutic or Preventive Procedure',
  'assertion_status': 'title',
  'body_location': 'null',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'null'},
 {'phrase': 'renovascular hypertension',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'renal',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'non - st elevation myocardial infarction',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'gout',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': "crohn ' s disease",
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'emergency department',
  'semantic_type': 'Therapeutic or Preventive Procedure',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'null'},
 {'phrase': 'pain',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'present',
  'body_location': 'right lower quadrant',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'nausea',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'vomiting',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'diarrhea',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'constipation',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'}]

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Citation

If you find our paper or models helpful, please consider cite:

BibTeX:

@misc{ying2025geniegenerativenoteinformation,
      title={GENIE: Generative Note Information Extraction model for structuring EHR data}, 
      author={Huaiyuan Ying and Hongyi Yuan and Jinsen Lu and Zitian Qu and Yang Zhao and Zhengyun Zhao and Isaac Kohane and Tianxi Cai and Sheng Yu},
      year={2025},
      eprint={2501.18435},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.18435}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご