TinySapBERT-from-TinyPubMedBERT-v1.0 Open-Source Model - Empowering Biomedical Named Entity Recognition!

Tinysapbert From TinyPubMedBERT V1.0

Developed by dmis-lab

TinySapBERT is a compact biomedical entity representation model trained on the SapBERT framework, specifically designed for biomedical named entity recognition tasks.

Large Language Model

Transformers

#Biomedical Entity Representation #Distillation Model #Lightweight BERT

Downloads 16.93k

Release Time : 11/11/2022

Model Overview

This model starts with TinyPubMedBERT as its initial point and is trained using the SapBERT framework, focusing on entity representation and named entity recognition tasks in the biomedical field.

Model Features

Compact Design

Based on the TinyPubMedBERT distillation model, it maintains high performance while reducing model size.

Biomedical Specialization

Specially optimized for entity representation in the biomedical field.

SapBERT Framework

Utilizes a self-aligned pre-training framework to enhance entity representation capabilities.

Model Capabilities

Biomedical Entity Representation

Named Entity Recognition

Text Embedding

Use Cases

Biomedical Information Processing

Biomedical Literature Analysis

Identify and extract key entity information from biomedical literature.

Clinical Record Processing

Analyze medical entities and terms in clinical records.

🚀 TinySapBERT

This repository offers "TinySapBERT", a tiny-sized biomedical entity representation (language model) trained with the official SapBERT code and instructions (Liu et al., NAACL 2021).

🚀 Quick Start

This model repository introduces "TinySapBERT", a tiny-sized biomedical entity representation (language model). It is trained using the official SapBERT code and instructions (Liu et al., NAACL 2021). We utilized our TinyPubMedBERT, a tiny-sized language model, as the initial point for training following the SapBERT scheme.

cf) TinyPubMedBERT is a distilled version of PubMedBERT (Gu et al., 2021), which was open-sourced with the release of the KAZU (Korea University and AstraZeneca) framework.

For more details, please visit the KAZU framework or refer to our paper titled Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework (EMNLP 2022 industry track).
For a demo of the KAZU framework, please visit http://kazu.korea.ac.kr

📚 Documentation

Citation info

The joint-first authors are Richard Jackson (AstraZeneca) and WonJin Yoon (Korea University).

Please cite the simplified version using the following section, or find the full citation information here

@inproceedings{YoonAndJackson2022BiomedicalNER,
  title="Biomedical {NER} for the Enterprise with Distillated {BERN}2 and the Kazu Framework",
  author="Yoon, Wonjin and Jackson, Richard and Ford, Elliot and Poroshin, Vladimir and Kang, Jaewoo",
  booktitle="Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track",
  month = dec,
  year = "2022",    
  address = "Abu Dhabi, UAE",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2022.emnlp-industry.63",
  pages = "619--626",
}

The model used resources from the SapBERT paper. We are grateful to the authors for making these resources publicly available!

Liu, Fangyu, et al. "Self-Alignment Pretraining for Biomedical Entity Representations." 
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.

Contact Information

If you need help or encounter issues when using the codes or the model (NER module of KAZU) in this repository, please contact WonJin Yoon (wonjin.info (at) gmail.com) or submit a GitHub issue.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご