BioELECTRA - PICO开源生物医学语言模型，多任务创佳绩助力研究分析

首页

Bioelectra PICO

由 kamalkraj 开发

BioELECTRA是基于ELECTRA框架预训练的生物医学领域专用语言模型，在多种生物医学NLP任务上创下性能记录

大型语言模型

Transformers

#生物医学文本编码 #替换令牌检测 #临床NLP优化

下载量 10.88k

发布时间 : 3/2/2022

模型简介

采用ELECTRA的'替换令牌检测'预训练技术，使用生物医学文本和词汇从头预训练的生物医学语言编码器模型，专为生物医学文本处理优化

模型特点

领域专用预训练

使用PubMed和PMC全文数据进行生物医学领域专用预训练

高效判别式训练

采用ELECTRA的替换令牌检测技术，比传统MLM训练更高效

多任务性能领先

在BLURB和BLUE生物医学NLP基准测试的13个数据集上创下新记录

模型能力

生物医学文本理解

临床文本分析

医学问答

医学推理

医学文本分类

使用案例

临床决策支持

医学文献问答

回答基于PubMed文献的医学问题

在PubMedQA数据集达到64%准确率（提升2.98%）

医学研究

医学文本推理

医学文本蕴含关系判断

在MedNLI数据集达到86.34%准确率（提升1.39%）

🚀 BioELECTRA - PICO

BioELECTRA - PICO 是一个针对生物医学领域的预训练文本编码器模型，它采用了 ELECTRA 的 “替换词检测” 预训练技术，在多个生物医学 NLP 基准测试中表现出色，为生物医学文本挖掘任务提供了强大的支持。

🚀 快速开始

引用说明

如果您使用了我们的研究成果，请使用以下 BibTeX 格式引用我们的论文：

@inproceedings{kanakarajan-etal-2021-bioelectra,
    title = "{B}io{ELECTRA}:Pretrained Biomedical text Encoder using Discriminators",
    author = "Kanakarajan, Kamal raj  and
      Kundumani, Bhuvana  and
      Sankarasubbu, Malaikannan",
    booktitle = "Proceedings of the 20th Workshop on Biomedical Language Processing",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.bionlp-1.16",
    doi = "10.18653/v1/2021.bionlp-1.16",
    pages = "143--154",
    abstract = "Recent advancements in pretraining strategies in NLP have shown a significant improvement in the performance of models on various text mining tasks. We apply {`}replaced token detection{'} pretraining technique proposed by ELECTRA and pretrain a biomedical language model from scratch using biomedical text and vocabulary. We introduce BioELECTRA, a biomedical domain-specific language encoder model that adapts ELECTRA for the Biomedical domain. WE evaluate our model on the BLURB and BLUE biomedical NLP benchmarks. BioELECTRA outperforms the previous models and achieves state of the art (SOTA) on all the 13 datasets in BLURB benchmark and on all the 4 Clinical datasets from BLUE Benchmark across 7 different NLP tasks. BioELECTRA pretrained on PubMed and PMC full text articles performs very well on Clinical datasets as well. BioELECTRA achieves new SOTA 86.34{\%}(1.39{\%} accuracy improvement) on MedNLI and 64{\%} (2.98{\%} accuracy improvement) on PubMedQA dataset.",
}

示例信息

在相关研究中发现：与安慰剂组相比，阿司匹林组的头痛持续时间有所缩短（P<0.05）。

widget:
  - text: "Those in the aspirin group experienced reduced duration of headache compared to those in the placebo arm (P<0.05)"

精选推荐AI模型

Llama 3 Typhoon V1.5x 8b Instruct

专为泰语设计的80亿参数指令模型，性能媲美GPT-3.5-turbo，优化了应用场景、检索增强生成、受限生成和推理任务

Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型，专为边缘设备推理设计，体积仅为Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基于RoBERTa架构的中文抽取式问答模型，适用于从给定文本中提取答案的任务。

智启未来，您的人工智能解决方案智库