🚀 TinySapBERT
This repository offers "TinySapBERT", a tiny-sized biomedical entity representation (language model) trained with the official SapBERT code and instructions (Liu et al., NAACL 2021).
🚀 Quick Start
This model repository introduces "TinySapBERT", a tiny-sized biomedical entity representation (language model). It is trained using the official SapBERT code and instructions (Liu et al., NAACL 2021). We utilized our TinyPubMedBERT, a tiny-sized language model, as the initial point for training following the SapBERT scheme.
cf) TinyPubMedBERT is a distilled version of PubMedBERT (Gu et al., 2021), which was open-sourced with the release of the KAZU (Korea University and AstraZeneca) framework.
- For more details, please visit the KAZU framework or refer to our paper titled Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework (EMNLP 2022 industry track).
- For a demo of the KAZU framework, please visit http://kazu.korea.ac.kr
📚 Documentation
Citation info
The joint-first authors are Richard Jackson (AstraZeneca) and WonJin Yoon (Korea University).
Please cite the simplified version using the following section, or find the full citation information here
@inproceedings{YoonAndJackson2022BiomedicalNER,
title="Biomedical {NER} for the Enterprise with Distillated {BERN}2 and the Kazu Framework",
author="Yoon, Wonjin and Jackson, Richard and Ford, Elliot and Poroshin, Vladimir and Kang, Jaewoo",
booktitle="Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track",
month = dec,
year = "2022",
address = "Abu Dhabi, UAE",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-industry.63",
pages = "619--626",
}
The model used resources from the SapBERT paper. We are grateful to the authors for making these resources publicly available!
Liu, Fangyu, et al. "Self-Alignment Pretraining for Biomedical Entity Representations."
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.
Contact Information
If you need help or encounter issues when using the codes or the model (NER module of KAZU) in this repository, please contact WonJin Yoon (wonjin.info (at) gmail.com) or submit a GitHub issue.