BiomedNLP-PubMedBERT Open-Source Textual Entailment Model - Free Processing of Biomedical Natural Language Inference

Biomednlp PubMedBERT Base Uncased Abstract Fulltext Finetuned Mnli

Developed by lighteternal

This model is a textual entailment model fine-tuned on the MNLI dataset based on PubMedBERT, specifically designed for natural language inference tasks in the biomedical domain.

Text Classification

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Biomedical Text Inference #PubMedBERT Fine-tuning #Medical Literature Analysis

Downloads 339

Release Time : 3/2/2022

Model Overview

This model is suitable for textual entailment tasks in biomedical texts, capable of determining the logical relationship (entailment, neutral, or contradiction) between two sentences.

Model Features

Biomedical Domain Optimization

Fine-tuned based on PubMedBERT, specifically optimized for biomedical texts

High Accuracy

Achieves 83.38% classification accuracy on the MNLI test set

Easy to Use

Provides HuggingFace model components and local running code examples for quick integration

Model Capabilities

Textual Entailment Recognition

Natural Language Inference

Biomedical Text Analysis

Use Cases

Biomedical Research

Medical Literature Analysis

Analyze the logical relationships between different statements in medical literature

Accurately identifies entailment or contradiction relationships between medical statements

Clinical Report Validation

Validate the consistency of statements in different sections of clinical reports

Helps identify potential contradictory statements in reports

🚀 BiomedNLP-PubMedBERT finetuned on textual entailment (NLI)

This project presents the microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext model finetuned on the MNLI dataset. It's highly useful for textual entailment tasks related to biomedical corpora.

🚀 Quick Start

The model is designed to handle textual entailment tasks. Given two sentences (a premise and a hypothesis), it outputs the logits of entailment, neutral, or contradiction.

✨ Features

Biomedical Focus: Specifically finetuned for biomedical corpora, making it well - suited for related textual entailment tasks.
Easy to Use: Can be easily integrated into existing projects with the provided code examples.

📦 Installation

No specific installation steps are provided in the original README. However, to use the model, you need to have the transformers library installed. You can install it using the following command:

pip install transformers

💻 Usage Examples

Basic Usage

# import torch
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

from transformers import AutoTokenizer, AutoModelForSequenceClassification
  
tokenizer = AutoTokenizer.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli")
model = AutoModelForSequenceClassification.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli")

premise = 'EpCAM is overexpressed in breast cancer'
hypothesis = 'EpCAM is downregulated in breast cancer.'

# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')
logits = model(x)[0]

probs = logits.softmax(dim=1)
print('Probabilities for entailment, neutral, contradiction \n', np.around(probs.cpu().
                                                                           detach().numpy(),3))
# Probabilities for entailment, neutral, contradiction 
# 0.001 0.001 0.998

Advanced Usage

You can use the model in a more complex pipeline, for example, integrating it with other NLP components. The basic principle remains the same: input two sentences and get the probabilities of the three labels.

📚 Documentation

Using the HuggingFace Model Widget

You can test the model using the HuggingFace model widget on the side:

Input two sentences (premise and hypothesis) one after the other.
The model returns the probabilities of 3 labels: entailment(LABEL:0), neutral(LABEL:1) and contradiction(LABEL:2) respectively.

Local Usage

To use the model locally on your machine, follow the code example provided in the "Usage Examples" section.

🔧 Technical Details

The model is based on the microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext architecture and finetuned on the MNLI dataset. It outputs logits for the three classes: entailment, neutral, and contradiction, which can be further converted to probabilities using the softmax function.

📄 License

This project is licensed under the MIT license.

📊 Metrics

Evaluation on classification accuracy (entailment, contradiction, neutral) on MNLI test set:

Property	Details
Accuracy	0.8338

See Training Metrics tab for detailed info.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご