BiomedNLP-KRISSBERT Open-Source Biomedical Entity Linking Model - Solving Entity Ambiguity to Boost Medical Research

Biomednlp KRISSBERT PubMed UMLS EL

Developed by microsoft

KRISSBERT is a knowledge-enhanced self-supervised learning model for biomedical entity linking. It trains contextual encoders using unannotated text and domain knowledge to effectively address the diversity and ambiguity of entity names.

Knowledge Graph

Transformers

EnglishOpen Source License:MIT #Biomedical Entity Linking #Self-supervised Learning #Knowledge-enhanced

Downloads 4,643

Release Time : 4/15/2022

Model Overview

KRISSBERT is a biomedical entity linking model that understands context and accurately links to standardized entity IDs (such as CUIs in UMLS), solving problems traditional methods face with unseen entities and lack of contextual understanding.

Model Features

Knowledge-enhanced Self-supervised Learning

Utilizes UMLS ontology biomedical entity names and PubMed abstracts for self-supervised pre-training without requiring gold-standard entity mention examples or canonical descriptions of all entities.

Contextual Understanding

Capable of understanding the context of entity mentions to accurately disambiguate and link to standardized entity IDs, rather than merely predicting surface forms.

High Performance

Achieves state-of-the-art performance on seven standard biomedical entity linking datasets, with accuracy improvements of up to 20 percentage points over previous self-supervised methods.

Model Capabilities

Biomedical Entity Linking

Contextual Understanding

Entity Disambiguation

Use Cases

Biomedical Research

Medical Literature Entity Linking

Links entity mentions in medical literature to standardized entity IDs in UMLS, such as linking "ER" to "Emergency Room" or "Estrogen Receptor Gene" based on context.

Achieves approximately 58.3% Top-1 accuracy on the MedMentions dataset.

🚀 KRISSBERT

KRISSBERT is a contextual encoder for entity linking. It addresses the challenges in entity linking by leveraging Knowledge-RIch Self-Supervision (KRISS) with readily available unlabeled text and domain knowledge. This model outperforms prior self - supervised methods in biomedical entity linking tasks.

🚀 Quick Start

The following steps show how to use KRISSBERT for entity linking with the MedMentions dataset.

📦 Installation

1. Create conda environment and install requirements

conda create -n kriss -y python=3.8 && conda activate kriss
pip install -r requirements.txt

2. Switch the root dir to usage

cd usage

3. Download the MedMentions dataset

git clone https://github.com/chanzuckerberg/MedMentions.git

💻 Usage Examples

1. Generate prototype embeddings

python generate_prototypes.py

2. Run entity linking

python run_entity_linking.py

This will give you about 58.3% top - 1 accuracy.

✨ Features

Knowledge - Rich Self - Supervision: KRISSBERT leverages readily available unlabeled text and domain knowledge for self - supervision, which helps in handling entity linking challenges such as prolific variations and prevalent ambiguities.
Context - Aware: Unlike some prior systems, KRISSBERT takes into account the context of an entity mention, enabling it to disambiguate ambiguous mentions more effectively.
State - of - the - Art Performance: Experiments on seven standard biomedical entity linking datasets show that KRISSBERT attains a new state of the art, outperforming prior self - supervised methods by as much as 20 absolute points in accuracy.

📚 Documentation

Entity linking faces significant challenges such as prolific variations and prevalent ambiguities, especially in high - value domains with myriad entities. Standard classification approaches suffer from the annotation bottleneck and cannot effectively handle unseen entities. Zero - shot entity linking has emerged as a promising direction for generalizing to new entities, but it still requires example gold entity mentions during training and canonical descriptions for all entities, both of which are rarely available outside of Wikipedia (Logeswaran et al., 2019; Wu et al., 2020).

Specifically, the KRISSBERT model is initialized with PubMedBERT parameters, and then continuously pretrained using biomedical entity names from the UMLS ontology to self - supervise entity linking examples from PubMed abstracts.

Some prior systems like BioSyn, SapBERT, and their follow - up work (e.g., Lai et al., 2021) claimed to do entity linking, but they completely ignore the context of an entity mention, and can only predict a surface form in the entity dictionary, not the canonical entity ID (e.g., CUI in UMLS). Therefore, they can't disambiguate ambiguous mentions.

🔧 Technical Details

The KRISSBERT model is initialized with the parameters of PubMedBERT. Then, it is continuously pretrained using biomedical entity names from the UMLS ontology. The self - supervision process uses entity linking examples from PubMed abstracts.

📄 License

This project is licensed under the MIT license.

📚 Citation

If you find KRISSBERT useful in your research, please cite the following paper:

@article{krissbert,
  author = {Sheng Zhang, Hao Cheng, Shikhar Vashishth, Cliff Wong, Jinfeng Xiao, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon},
  title = {Knowledge-Rich Self-Supervision for Biomedical Entity Linking},
  year = {2021},
  url = {https://arxiv.org/abs/2112.07887},
  eprinttype = {arXiv},
  eprint = {2112.07887},
}

Property	Details
Model Type	Contextual encoder for entity linking
Training Data	Biomedical entity names from the UMLS ontology and entity linking examples from PubMed abstracts

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご