🚀 BioM-Transformers:使用BERT、ALBERT和ELECTRA构建大型生物医学语言模型
BioM-Transformers旨在研究不同设计选择对生物医学语言模型性能的影响,并通过实验证明其在多个生物医学领域任务中能以相似或更低的计算成本取得最先进的结果。
🚀 快速开始
模型训练
使用以下脚本对模型进行训练:
python3 run_squad.py --model_type electra --model_name_or_path sultan/BioM-ELECTRA-Large-SQuAD2 \
--train_file BioASQ8B/train.json \
--predict_file BioASQ8B/dev.json \
--do_lower_case \
--do_train \
--do_eval \
--threads 20 \
--version_2_with_negative \
--num_train_epochs 3 \
--learning_rate 5e-5 \
--max_seq_length 512 \
--doc_stride 128 \
--per_gpu_train_batch_size 8 \
--gradient_accumulation_steps 2 \
--per_gpu_eval_batch_size 128 \
--logging_steps 50 \
--save_steps 5000 \
--fp16 \
--fp16_opt_level O1 \
--overwrite_output_dir \
--output_dir BioM-ELECTRA-Large-SQuAD-BioASQ \
--overwrite_cache
✨ 主要特性
- 微调优化:该模型先在SQuAD2.0数据集上进行微调,然后在BioASQ8B-Factoid训练数据集上微调。我们将BioASQ8B-Factoid训练数据集转换为SQuAD1.1格式,并在此数据集上对模型(BioM-ELECTRA-Base-SQuAD2)进行训练和评估。
- 直接推理:可以直接使用该模型进行预测(推理),无需进一步微调。你可以在模型卡片的上下文框中输入一篇PubMed摘要,并尝试提出一些给定上下文中的生物医学问题,观察其与原始ELECTRA模型的性能对比。该模型对于创建大流行问答系统(如COVID-19问答系统)也很有用。
- 版本差异:请注意,此版本(PyTorch)与我们参加BioASQ9B时使用的版本(带有逐层衰减的TensorFlow)不同。我们将BioASQ8B测试数据集的所有五批数据合并为一个dev.json文件。
📚 详细文档
模型性能对比
以下是我们的模型与原始ELECTRA基础版和大版本的非官方对比结果:
模型 |
精确匹配率 (EM) |
F1分数 |
ELECTRA-Base-SQuAD2-BioASQ8B |
61.89 |
74.39 |
BioM-ELECTRA-Base-SQuAD2-BioASQ8B |
70.31 |
80.90 |
ELECTRA-Large-SQuAD2-BioASQ8B |
67.36 |
78.90 |
BioM-ELECTRA-Large-SQuAD2-BioASQ8B |
74.31 |
84.72 |
📄 致谢
我们感谢Tensorflow研究云(TFRC)团队为我们提供TPUv3单元的访问权限。
📄 引用
@inproceedings{alrowili-shanker-2021-biom,
title = "{B}io{M}-Transformers: Building Large Biomedical Language Models with {BERT}, {ALBERT} and {ELECTRA}",
author = "Alrowili, Sultan and
Shanker, Vijay",
booktitle = "Proceedings of the 20th Workshop on Biomedical Language Processing",
month = jun,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.bionlp-1.24",
pages = "221--227",
abstract = "The impact of design choices on the performance of biomedical language models recently has been a subject for investigation. In this paper, we empirically study biomedical domain adaptation with large transformer models using different design choices. We evaluate the performance of our pretrained models against other existing biomedical language models in the literature. Our results show that we achieve state-of-the-art results on several biomedical domain tasks despite using similar or less computational cost compared to other models in the literature. Our findings highlight the significant effect of design choices on improving the performance of biomedical language models.",
}