BioELECTRA - Open-source Biomedical Language Model for PICO, Achieving Outstanding Results in Multi-tasks to Aid Research and Analysis

Bioelectra PICO

Developed by kamalkraj

BioELECTRA is a biomedical domain-specific language model pre-trained based on the ELECTRA framework, setting performance records on various biomedical NLP tasks

Large Language Model

Transformers

#Biomedical Text Encoding #Replaced Token Detection #Clinical NLP Optimization

Downloads 10.88k

Release Time : 3/2/2022

Model Overview

Utilizing ELECTRA's 'replaced token detection' pre-training technology, this is a biomedical language encoder model pre-trained from scratch using biomedical texts and vocabularies, specifically optimized for biomedical text processing

Model Features

Domain-Specific Pre-training

Pre-trained specifically for the biomedical domain using PubMed and PMC full-text data

Efficient Discriminative Training

Adopts ELECTRA's replaced token detection technology, more efficient than traditional MLM training

Leading Multi-task Performance

Set new records on 13 datasets in the BLURB and BLUE biomedical NLP benchmarks

Model Capabilities

Biomedical Text Understanding

Clinical Text Analysis

Medical Question Answering

Medical Reasoning

Medical Text Classification

Use Cases

Clinical Decision Support

Medical Literature Q&A

Answering medical questions based on PubMed literature

Achieved 64% accuracy on PubMedQA dataset (2.98% improvement)

Medical Research

Medical Text Reasoning

Medical text entailment judgment

Achieved 86.34% accuracy on MedNLI dataset (1.39% improvement)

🚀 BioELECTRA-PICO

BioELECTRA-PICO is a biomedical domain - specific language encoder model. It applies the 'replaced token detection' pretraining technique and achieves state - of - the - art performance on multiple biomedical NLP benchmarks.

🚀 Quick Start

Widget Information

The widget shows the following information:

Those in the aspirin group experienced reduced duration of headache compared to those in the placebo arm (P<0.05)

📚 Documentation

Citation

Cite our paper using the below citation:

@inproceedings{kanakarajan-etal-2021-bioelectra,
    title = "{B}io{ELECTRA}:Pretrained Biomedical text Encoder using Discriminators",
    author = "Kanakarajan, Kamal raj  and
      Kundumani, Bhuvana  and
      Sankarasubbu, Malaikannan",
    booktitle = "Proceedings of the 20th Workshop on Biomedical Language Processing",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.bionlp-1.16",
    doi = "10.18653/v1/2021.bionlp-1.16",
    pages = "143--154",
    abstract = "Recent advancements in pretraining strategies in NLP have shown a significant improvement in the performance of models on various text mining tasks. We apply {`}replaced token detection{'} pretraining technique proposed by ELECTRA and pretrain a biomedical language model from scratch using biomedical text and vocabulary. We introduce BioELECTRA, a biomedical domain-specific language encoder model that adapts ELECTRA for the Biomedical domain. WE evaluate our model on the BLURB and BLUE biomedical NLP benchmarks. BioELECTRA outperforms the previous models and achieves state of the art (SOTA) on all the 13 datasets in BLURB benchmark and on all the 4 Clinical datasets from BLUE Benchmark across 7 different NLP tasks. BioELECTRA pretrained on PubMed and PMC full text articles performs very well on Clinical datasets as well. BioELECTRA achieves new SOTA 86.34{\%}(1.39{\%} accuracy improvement) on MedNLI and 64{\%} (2.98{\%} accuracy improvement) on PubMedQA dataset.",
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご