VetBERT开源兽医临床笔记处理模型 - 免费助力高效兽医信息分析

首页

Vetbert

由 havocy28 开发

VetBERT是一个专为处理兽医临床笔记设计的预训练语言模型，基于Bio_ClinicalBERT进一步在兽医医学领域数据上训练。

大型语言模型

Transformers

英语开源协议:Openrail #兽医临床笔记处理 #疾病综合征分类 #生物医学NLP

下载量 87

发布时间 : 3/2/2022

模型简介

该模型主要用于处理兽医临床笔记的NLP任务，特别针对疾病综合征分类等兽医医学特定任务进行了优化。

模型特点

兽医领域专业化

在Bio_ClinicalBERT基础上进一步在兽医临床数据上预训练，具有兽医医学领域的专业理解能力。

大规模训练数据

训练数据包含超过1500万条兽医临床记录和13亿个词元。

疾病分类优化

特别针对疾病综合征分类任务进行了优化，有专门的微调版本VetBERTDx。

模型能力

兽医临床文本理解

疾病综合征分类

临床笔记填充预测

使用案例

兽医临床诊断

术后复查

分析术后检查报告，预测可能的检查结果

血糖检测

分析动物血糖检测报告，预测可能的检测值

动物健康监测

疾病早期发现

通过分析临床笔记早期发现动物疾病征兆

🚀 VetBERT：用于兽医临床任务的预训练模型

VetBERT是一个专门为兽医临床笔记相关的自然语言处理（NLP）任务而设计的预训练模型。它能助力兽医领域的信息处理，提升临床诊断和研究的效率。

🚀 快速开始

本预训练模型来自GitHub仓库：https://github.com/havocy28/VetBERT。可以通过以下方式加载模型：

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("havocy28/VetBERT")
model = AutoModelForMaskedLM.from_pretrained("havocy28/VetBERT")

VetBERT_masked = pipeline("fill-mask", model=model, tokenizer=tokenizer)
VetBERT('Suspected pneuomina, will require an [MASK] but in the meantime will prescribed antibiotics')

✨ 主要特性

专业适配：该模型专为兽医临床笔记的NLP任务而设计，在兽医医学领域具有高度的专业性和针对性。
深度预训练：基于ClinicalBERT（Bio + Clinical BERT）进行初始化，并在VetCompass Australia语料库上进一步预训练，能更好地处理兽医相关文本。
可微调性：可以在特定的兽医临床数据集上进行微调，以适应不同的疾病综合征分类任务。

📦 安装指南

使用该模型，你需要安装transformers库：

pip install transformers

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("havocy28/VetBERT")
model = AutoModelForMaskedLM.from_pretrained("havocy28/VetBERT")

VetBERT_masked = pipeline("fill-mask", model=model, tokenizer=tokenizer)
VetBERT('Suspected pneuomina, will require an [MASK] but in the meantime will prescribed antibiotics')

📚 详细文档

预训练数据

VetBERT模型从Bio_ClinicalBERT模型初始化而来，而Bio_ClinicalBERT又是从BERT初始化的。VetBERT模型在超过1500万条兽医临床记录和13亿个标记上进行了训练。

预训练超参数

在VetBERT的预训练阶段，使用了以下超参数：

批量大小：32
最大序列长度：512
学习率：5 · 10−5
输入数据复制因子（不同掩码）：5 其他默认参数保持不变（具体而言，掩码语言模型概率 = 0.15，每个序列的最大预测数 = 20）。

VetBERT微调

根据论文Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes，VetBERT在一组5002条带注释的临床笔记上进一步微调，以对与临床笔记相关的疾病综合征进行分类。

📄 许可证

本模型使用OpenRail许可证。

📚 引用

如果你使用了该模型，请引用以下文章： Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, and James Gilkerson. 2020. Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 156–166, Online. Association for Computational Linguistics.