BioELECTRA - PICO開源生物醫學語言模型，多任務創佳績助力研究分析

首頁

Bioelectra PICO

由kamalkraj開發

BioELECTRA是基於ELECTRA框架預訓練的生物醫學領域專用語言模型，在多種生物醫學NLP任務上創下性能記錄

大型語言模型

Transformers

#生物醫學文本編碼 #替換令牌檢測 #臨床NLP優化

下載量 10.88k

發布時間 : 3/2/2022

模型概述

採用ELECTRA的'替換令牌檢測'預訓練技術，使用生物醫學文本和詞彙從頭預訓練的生物醫學語言編碼器模型，專為生物醫學文本處理優化

模型特點

領域專用預訓練

使用PubMed和PMC全文數據進行生物醫學領域專用預訓練

高效判別式訓練

採用ELECTRA的替換令牌檢測技術，比傳統MLM訓練更高效

多任務性能領先

在BLURB和BLUE生物醫學NLP基準測試的13個數據集上創下新記錄

模型能力

生物醫學文本理解

臨床文本分析

醫學問答

醫學推理

醫學文本分類

使用案例

臨床決策支持

醫學文獻問答

回答基於PubMed文獻的醫學問題

在PubMedQA數據集達到64%準確率（提升2.98%）

醫學研究

醫學文本推理

醫學文本蘊含關係判斷

在MedNLI數據集達到86.34%準確率（提升1.39%）

🚀 BioELECTRA - PICO

BioELECTRA - PICO 是一個針對生物醫學領域的預訓練文本編碼器模型，它採用了 ELECTRA 的 “替換詞檢測” 預訓練技術，在多個生物醫學 NLP 基準測試中表現出色，為生物醫學文本挖掘任務提供了強大的支持。

🚀 快速開始

引用說明

如果您使用了我們的研究成果，請使用以下 BibTeX 格式引用我們的論文：

@inproceedings{kanakarajan-etal-2021-bioelectra,
    title = "{B}io{ELECTRA}:Pretrained Biomedical text Encoder using Discriminators",
    author = "Kanakarajan, Kamal raj  and
      Kundumani, Bhuvana  and
      Sankarasubbu, Malaikannan",
    booktitle = "Proceedings of the 20th Workshop on Biomedical Language Processing",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.bionlp-1.16",
    doi = "10.18653/v1/2021.bionlp-1.16",
    pages = "143--154",
    abstract = "Recent advancements in pretraining strategies in NLP have shown a significant improvement in the performance of models on various text mining tasks. We apply {`}replaced token detection{'} pretraining technique proposed by ELECTRA and pretrain a biomedical language model from scratch using biomedical text and vocabulary. We introduce BioELECTRA, a biomedical domain-specific language encoder model that adapts ELECTRA for the Biomedical domain. WE evaluate our model on the BLURB and BLUE biomedical NLP benchmarks. BioELECTRA outperforms the previous models and achieves state of the art (SOTA) on all the 13 datasets in BLURB benchmark and on all the 4 Clinical datasets from BLUE Benchmark across 7 different NLP tasks. BioELECTRA pretrained on PubMed and PMC full text articles performs very well on Clinical datasets as well. BioELECTRA achieves new SOTA 86.34{\%}(1.39{\%} accuracy improvement) on MedNLI and 64{\%} (2.98{\%} accuracy improvement) on PubMedQA dataset.",
}

示例信息

在相關研究中發現：與安慰劑組相比，阿司匹林組的頭痛持續時間有所縮短（P<0.05）。

widget:
  - text: "Those in the aspirin group experienced reduced duration of headache compared to those in the placebo arm (P<0.05)"

精選推薦AI模型

Llama 3 Typhoon V1.5x 8b Instruct

專為泰語設計的80億參數指令模型，性能媲美GPT-3.5-turbo，優化了應用場景、檢索增強生成、受限生成和推理任務

Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型，專為邊緣設備推理設計，體積僅為Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基於RoBERTa架構的中文抽取式問答模型，適用於從給定文本中提取答案的任務。

智啟未來，您的人工智能解決方案智庫