๐ VetBERT Pretrained model for Veterinary Clinical Tasks
This is a pre - trained model designed for NLP tasks related to veterinary clinical notes. It can help process and analyze veterinary clinical information, providing valuable support for veterinary medicine.
๐ Quick Start
Load the model via the transformers library:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("havocy28/VetBERT")
model = AutoModelForMaskedLM.from_pretrained("havocy28/VetBERT")
VetBERT_masked = pipeline("fill - mask", model=model, tokenizer=tokenizer)
VetBERT('Suspected pneuomina, will require an [MASK] but in the meantime will prescribed antibiotics')
โจ Features
- Domain - specific: This pre - trained model is tailored for veterinary clinical notes, enabling it to handle tasks specific to veterinary medicine.
- Based on solid research: Introduced in the [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp - 1.17) (Hur et al., BioNLP 2020) paper, it has a theoretical foundation.
๐ฆ Installation
The provided code uses the transformers
library. You can install it using the following command:
pip install transformers
๐ Documentation
Model Source
This is the pretrained VetBERT model from the github repo: https://github.com/havocy28/VetBERT.
Model Introduction
The [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp - 1.17) (Hur et al., BioNLP 2020) paper introduced the VetBERT model. It starts with an initialized Bert Model from ClinicalBERT (Bio + Clinical BERT) and is further pretrained on the VetCompass Australia corpus for veterinary - specific tasks. The paper also discusses VetBERTDx, the finetuned version of VetBERT for the disease classification task.
Pretraining Data
The VetBERT model was initialized from the Bio_ClinicalBERT model, which was initialized from BERT. It was trained on over 15 million veterinary clinical records and 1.3 billion tokens.
Pretraining Hyperparameters
During the pretraining phase for VetBERT, we used a batch size of 32, a maximum sequence length of 512, and a learning rate of 5 ยท 10โ5. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15 and max predictions per sequence = 20).
VetBERT Finetuning
VetBERT was further finetuned on a set of 5002 annotated clinical notes to classify the disease syndrome associated with the clinical notes as outlined in the [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp - 1.17) paper.
๐ง Technical Details
- Data scale: Trained on over 15 million veterinary clinical records and 1.3 billion tokens, ensuring a large - scale data foundation.
- Hyperparameters: During pretraining, specific hyperparameters such as batch size, sequence length, and learning rate were used to optimize the model's performance.
๐ License
The model is under the OpenRail license.
Model Information
Property |
Details |
Model Type |
VetBERT Pretrained model |
Training Data |
Over 15 million veterinary clinical records and 1.3 billion tokens |
Pipeline Tag |
fill - mask |
Tags |
biology, medical |
Widget Examples
- Post operative Checkup
- Text: "poc all well. wound healed. No [MASK] on exam. Microchip working. Sign off, resee if worried."
- Blood Glucose check
- Text: "other 2 degu's unwell recently want health check for this one appears well for age blood [MASK] 3.8. offer to reweigh and monitor weight"
๐ Citation
Please cite this article: Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, and James Gilkerson. 2020. [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp - 1.17). In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 156โ166, Online. Association for Computational Linguistics.