🚀 BioM-Transformers:使用BERT、ALBERT和ELECTRA構建大型生物醫學語言模型
BioM-Transformers旨在研究不同設計選擇對生物醫學語言模型性能的影響,並通過實驗證明其在多個生物醫學領域任務中能以相似或更低的計算成本取得最先進的結果。
🚀 快速開始
模型訓練
使用以下腳本對模型進行訓練:
python3 run_squad.py --model_type electra --model_name_or_path sultan/BioM-ELECTRA-Large-SQuAD2 \
--train_file BioASQ8B/train.json \
--predict_file BioASQ8B/dev.json \
--do_lower_case \
--do_train \
--do_eval \
--threads 20 \
--version_2_with_negative \
--num_train_epochs 3 \
--learning_rate 5e-5 \
--max_seq_length 512 \
--doc_stride 128 \
--per_gpu_train_batch_size 8 \
--gradient_accumulation_steps 2 \
--per_gpu_eval_batch_size 128 \
--logging_steps 50 \
--save_steps 5000 \
--fp16 \
--fp16_opt_level O1 \
--overwrite_output_dir \
--output_dir BioM-ELECTRA-Large-SQuAD-BioASQ \
--overwrite_cache
✨ 主要特性
- 微調優化:該模型先在SQuAD2.0數據集上進行微調,然後在BioASQ8B-Factoid訓練數據集上微調。我們將BioASQ8B-Factoid訓練數據集轉換為SQuAD1.1格式,並在此數據集上對模型(BioM-ELECTRA-Base-SQuAD2)進行訓練和評估。
- 直接推理:可以直接使用該模型進行預測(推理),無需進一步微調。你可以在模型卡片的上下文框中輸入一篇PubMed摘要,並嘗試提出一些給定上下文中的生物醫學問題,觀察其與原始ELECTRA模型的性能對比。該模型對於創建大流行問答系統(如COVID-19問答系統)也很有用。
- 版本差異:請注意,此版本(PyTorch)與我們參加BioASQ9B時使用的版本(帶有逐層衰減的TensorFlow)不同。我們將BioASQ8B測試數據集的所有五批數據合併為一個dev.json文件。
📚 詳細文檔
模型性能對比
以下是我們的模型與原始ELECTRA基礎版和大版本的非官方對比結果:
模型 |
精確匹配率 (EM) |
F1分數 |
ELECTRA-Base-SQuAD2-BioASQ8B |
61.89 |
74.39 |
BioM-ELECTRA-Base-SQuAD2-BioASQ8B |
70.31 |
80.90 |
ELECTRA-Large-SQuAD2-BioASQ8B |
67.36 |
78.90 |
BioM-ELECTRA-Large-SQuAD2-BioASQ8B |
74.31 |
84.72 |
📄 致謝
我們感謝Tensorflow研究雲(TFRC)團隊為我們提供TPUv3單元的訪問權限。
📄 引用
@inproceedings{alrowili-shanker-2021-biom,
title = "{B}io{M}-Transformers: Building Large Biomedical Language Models with {BERT}, {ALBERT} and {ELECTRA}",
author = "Alrowili, Sultan and
Shanker, Vijay",
booktitle = "Proceedings of the 20th Workshop on Biomedical Language Processing",
month = jun,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.bionlp-1.24",
pages = "221--227",
abstract = "The impact of design choices on the performance of biomedical language models recently has been a subject for investigation. In this paper, we empirically study biomedical domain adaptation with large transformer models using different design choices. We evaluate the performance of our pretrained models against other existing biomedical language models in the literature. Our results show that we achieve state-of-the-art results on several biomedical domain tasks despite using similar or less computational cost compared to other models in the literature. Our findings highlight the significant effect of design choices on improving the performance of biomedical language models.",
}