đ BiomedNLP-PubMedBERT finetuned on textual entailment (NLI)
This project presents the microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext model finetuned on the MNLI dataset. It's highly useful for textual entailment tasks related to biomedical corpora.
đ Quick Start
The model is designed to handle textual entailment tasks. Given two sentences (a premise and a hypothesis), it outputs the logits of entailment, neutral, or contradiction.
⨠Features
- Biomedical Focus: Specifically finetuned for biomedical corpora, making it well - suited for related textual entailment tasks.
- Easy to Use: Can be easily integrated into existing projects with the provided code examples.
đĻ Installation
No specific installation steps are provided in the original README. However, to use the model, you need to have the transformers
library installed. You can install it using the following command:
pip install transformers
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli")
model = AutoModelForSequenceClassification.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli")
premise = 'EpCAM is overexpressed in breast cancer'
hypothesis = 'EpCAM is downregulated in breast cancer.'
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
truncation_strategy='only_first')
logits = model(x)[0]
probs = logits.softmax(dim=1)
print('Probabilities for entailment, neutral, contradiction \n', np.around(probs.cpu().
detach().numpy(),3))
Advanced Usage
You can use the model in a more complex pipeline, for example, integrating it with other NLP components. The basic principle remains the same: input two sentences and get the probabilities of the three labels.
đ Documentation
Using the HuggingFace Model Widget
You can test the model using the HuggingFace model widget on the side:
- Input two sentences (premise and hypothesis) one after the other.
- The model returns the probabilities of 3 labels: entailment(LABEL:0), neutral(LABEL:1) and contradiction(LABEL:2) respectively.
Local Usage
To use the model locally on your machine, follow the code example provided in the "Usage Examples" section.
đ§ Technical Details
The model is based on the microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext architecture and finetuned on the MNLI dataset. It outputs logits for the three classes: entailment, neutral, and contradiction, which can be further converted to probabilities using the softmax function.
đ License
This project is licensed under the MIT license.
đ Metrics
Evaluation on classification accuracy (entailment, contradiction, neutral) on MNLI test set:
Property |
Details |
Accuracy |
0.8338 |
See Training Metrics tab for detailed info.