SciFive-base-Pubmed_PMC開源模型 - 助力生物醫學文獻文本轉換研究

首頁

Scifive Base Pubmed PMC

由razent開發

SciFive是一個專門針對生物醫學文獻的文本到文本轉換模型，基於PubMed和PMC開放獲取期刊全文庫訓練。

大型語言模型英語#生物醫學文本生成 #PubMed/PMC預訓練 #醫學文獻問答

下載量 754

發布時間 : 3/2/2022

模型概述

該模型主要用於處理生物醫學領域的文本任務，如文本分類、問答系統和文本生成等。

模型特點

生物醫學領域專用

專門針對生物醫學文獻進行優化，在該領域表現優異

多任務處理能力

能夠處理多種文本任務，包括分類、問答和生成

大規模訓練數據

基於PubMed和PMC開放獲取期刊全文庫訓練

模型能力

生物醫學文本分類

生物醫學問答系統

生物醫學文本生成

生物醫學文獻摘要

使用案例

醫學研究

文獻摘要生成

自動生成生物醫學文獻的摘要

提高研究人員文獻閱讀效率

醫學問答系統

回答與生物醫學相關的問題

輔助醫學研究和臨床決策

學術研究

文獻分類

對生物醫學文獻進行自動分類

提高文獻管理和檢索效率

🚀 SciFive Pubmed+PMC Base

SciFive Pubmed+PMC Base是一個用於生物醫學文獻處理的文本到文本的變換器模型，可應用於多種自然語言處理任務，如令牌分類、文本分類、問答和文本生成等。

🚀 快速開始

對於更多詳細信息，請查看我們的GitHub倉庫。

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("razent/SciFive-base-Pubmed_PMC")  
model = AutoModelForSeq2SeqLM.from_pretrained("razent/SciFive-base-Pubmed_PMC")

sentence = "Identification of APC2 , a homologue of the adenomatous polyposis coli tumour suppressor ."
text =  sentence + "</s>"

encoding = tokenizer.encode_plus(text, pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"].to("cuda"), encoding["attention_mask"].to("cuda")

outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=256,
    early_stopping=True
)

for output in outputs:
    line = tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
    print(line)

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("razent/SciFive-base-Pubmed_PMC")  
model = AutoModelForSeq2SeqLM.from_pretrained("razent/SciFive-base-Pubmed_PMC")

sentence = "Identification of APC2 , a homologue of the adenomatous polyposis coli tumour suppressor ."
text =  sentence + "</s>"

encoding = tokenizer.encode_plus(text, pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"].to("cuda"), encoding["attention_mask"].to("cuda")

outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=256,
    early_stopping=True
)

for output in outputs:
    line = tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
    print(line)

📚 詳細文檔

論文信息

論文：SciFive: a text-to-text transformer model for biomedical literature 作者：Long N. Phan, James T. Anibal, Hieu Tran, Shaurya Chanana, Erol Bahadroglu, Alec Peltekian, Grégoire Altan-Bonnet