VetBERT Open-Source Veterinary Clinical Note Processing Model - Free to Boost Efficient Veterinary Information Analysis

Vetbert

Developed by havocy28

VetBERT is a pretrained language model specifically designed for processing veterinary clinical notes, further trained on veterinary medical domain data based on Bio_ClinicalBERT.

Large Language Model

Transformers

EnglishOpen Source License:Openrail #Veterinary clinical note processing #Disease syndrome classification #Biomedical NLP

Downloads 87

Release Time : 3/2/2022

Model Overview

This model is primarily used for NLP tasks involving veterinary clinical notes, with particular optimization for veterinary-specific tasks such as disease syndrome classification.

Model Features

Veterinary domain specialization

Further pretrained on veterinary clinical data based on Bio_ClinicalBERT, possessing professional understanding capabilities in the veterinary medical field.

Large-scale training data

Training data includes over 15 million veterinary clinical records and 1.3 billion tokens.

Disease classification optimization

Specially optimized for disease syndrome classification tasks, with a dedicated fine-tuned version VetBERTDx.

Model Capabilities

Veterinary clinical text understanding

Disease syndrome classification

Clinical note fill-mask prediction

Use Cases

Veterinary clinical diagnosis

Postoperative review

Analyze postoperative examination reports to predict possible examination results

Blood glucose testing

Analyze animal blood glucose test reports to predict possible test values

Animal health monitoring

Early disease detection

Early detection of animal disease signs through analysis of clinical notes

🚀 VetBERT Pretrained model for Veterinary Clinical Tasks

This is a pre - trained model designed for NLP tasks related to veterinary clinical notes. It can help process and analyze veterinary clinical information, providing valuable support for veterinary medicine.

🚀 Quick Start

Load the model via the transformers library:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("havocy28/VetBERT")
model = AutoModelForMaskedLM.from_pretrained("havocy28/VetBERT")

VetBERT_masked = pipeline("fill - mask", model=model, tokenizer=tokenizer)
VetBERT('Suspected pneuomina, will require an [MASK] but in the meantime will prescribed antibiotics')

✨ Features

Domain - specific: This pre - trained model is tailored for veterinary clinical notes, enabling it to handle tasks specific to veterinary medicine.
Based on solid research: Introduced in the [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp - 1.17) (Hur et al., BioNLP 2020) paper, it has a theoretical foundation.

📦 Installation

The provided code uses the transformers library. You can install it using the following command:

pip install transformers

📚 Documentation

Model Source

This is the pretrained VetBERT model from the github repo: https://github.com/havocy28/VetBERT.

Model Introduction

The [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp - 1.17) (Hur et al., BioNLP 2020) paper introduced the VetBERT model. It starts with an initialized Bert Model from ClinicalBERT (Bio + Clinical BERT) and is further pretrained on the VetCompass Australia corpus for veterinary - specific tasks. The paper also discusses VetBERTDx, the finetuned version of VetBERT for the disease classification task.

Pretraining Data

The VetBERT model was initialized from the Bio_ClinicalBERT model, which was initialized from BERT. It was trained on over 15 million veterinary clinical records and 1.3 billion tokens.

Pretraining Hyperparameters

During the pretraining phase for VetBERT, we used a batch size of 32, a maximum sequence length of 512, and a learning rate of 5 · 10−5. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15 and max predictions per sequence = 20).

VetBERT Finetuning

VetBERT was further finetuned on a set of 5002 annotated clinical notes to classify the disease syndrome associated with the clinical notes as outlined in the [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp - 1.17) paper.

🔧 Technical Details

Data scale: Trained on over 15 million veterinary clinical records and 1.3 billion tokens, ensuring a large - scale data foundation.
Hyperparameters: During pretraining, specific hyperparameters such as batch size, sequence length, and learning rate were used to optimize the model's performance.

📄 License

The model is under the OpenRail license.

Model Information

Property	Details
Model Type	VetBERT Pretrained model
Training Data	Over 15 million veterinary clinical records and 1.3 billion tokens
Pipeline Tag	fill - mask
Tags	biology, medical

Widget Examples

Post operative Checkup
- Text: "poc all well. wound healed. No [MASK] on exam. Microchip working. Sign off, resee if worried."
Blood Glucose check
- Text: "other 2 degu's unwell recently want health check for this one appears well for age blood [MASK] 3.8. offer to reweigh and monitor weight"

📄 Citation

Please cite this article: Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, and James Gilkerson. 2020. [Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes](https://aclanthology.org/2020.bionlp - 1.17). In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 156–166, Online. Association for Computational Linguistics.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご