gliner - biomed - base - v1.0开源生物医学命名实体识别模型，精准识别多种实体类型

首页

Gliner Biomed Base V1.0

由 Ihor 开发

GLiNER-生物医学版是基于GLiNER框架开发的专用生物医学命名实体识别模型，能够识别多种生物医学实体类型。

序列标注

PyTorch

英语开源协议:Apache-2.0 #生物医学NER #零样本识别 #多实体类型

下载量 61

发布时间 : 2/19/2025

模型简介

该模型利用从生成式生物医学大模型提炼的合成标注数据进行训练，在生物医学实体识别任务中实现了零样本和小样本的先进性能。

模型特点

生物医学专用

专门针对生物医学领域优化的命名实体识别模型

零样本/小样本学习

在少量或没有标注数据的情况下仍能保持良好性能

多类型实体识别

能够同时识别多种生物医学实体类型

高效推理

相比大语言模型，资源消耗更低，推理速度更快

模型能力

生物医学实体识别

多类型实体检测

零样本学习

小样本学习

使用案例

医疗文本分析

电子病历实体提取

从电子病历中提取疾病、药物、化验结果等关键信息

准确识别多种医疗实体类型

医学文献信息抽取

从医学研究文献中提取关键实体信息

支持多种生物医学实体类型识别

临床决策支持

医嘱信息提取

从医嘱文本中提取药物、剂量、频率等信息

准确识别药物相关实体

🚀 GLiNER-BioMed

GLiNER-BioMed 是一套专门用于生物医学领域的高效命名实体识别（NER）模型。它基于 GLiNER 框架，借助从大型生成式生物医学语言模型中提炼的合成注释，在生物医学实体识别任务中实现了零样本和少样本学习的先进性能，为传统 NER 模型和大语言模型提供了实用的替代方案。

🚀 快速开始

安装

安装官方的 GLiNER 库：

pip install gliner -U

使用

安装 GLiNER 库后，你可以轻松加载 GLiNER-biomed 模型并进行命名实体识别：

from gliner import GLiNER

model = GLiNER.from_pretrained("Ihor/gliner-biomed-base-v1.0")

text = """
The patient, a 45-year-old male, was diagnosed with type 2 diabetes mellitus and hypertension.
He was prescribed Metformin 500mg twice daily and Lisinopril 10mg once daily. 
A recent lab test showed elevated HbA1c levels at 8.2%.
"""

labels = ["Disease", "Drug", "Drug dosage", "Drug frequency", "Lab test", "Lab test value", "Demographic information"]

entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

预期输出：

45-year-old male => Demographic information
type 2 diabetes mellitus => Disease
hypertension => Disease
Metformin => Drug
500mg => Drug dosage
twice daily => Drug frequency
Lisinopril => Drug
10mg => Drug dosage
once daily => Drug frequency
HbA1c levels => Lab test
8.2% => Lab test value

✨ 主要特性

广泛的实体识别能力：GLiNER 是一种命名实体识别（NER）模型，能够使用双向变压器编码器（类似 BERT）识别任何实体类型。
高效的生物医学模型：GLiNER-biomed 与日内瓦大学的 DS4DH 合作开发，基于 GLiNER 框架引入了一套专门的高效开放生物医学 NER 模型。
零样本和少样本学习：利用从大型生成式生物医学语言模型中提炼的合成注释，在生物医学实体识别任务中实现了零样本和少样本学习的先进性能。

📦 安装指南

使用 pip 安装官方的 GLiNER 库：

pip install gliner -U

💻 使用示例

基础用法

from gliner import GLiNER

model = GLiNER.from_pretrained("Ihor/gliner-biomed-base-v1.0")

text = """
The patient, a 45-year-old male, was diagnosed with type 2 diabetes mellitus and hypertension.
He was prescribed Metformin 500mg twice daily and Lisinopril 10mg once daily. 
A recent lab test showed elevated HbA1c levels at 8.2%.
"""

labels = ["Disease", "Drug", "Drug dosage", "Drug frequency", "Lab test", "Lab test value", "Demographic information"]

entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

📚 详细文档

模型信息

属性	详情
基础模型	microsoft/deberta-v3-base
数据集	knowledgator/GLINER-multi-task-synthetic-data、knowledgator/biomed_NER
语言	英文
库名称	gliner
许可证	apache-2.0
评估指标	f1
任务类型	标记分类
标签	NER、GLiNER、信息提取、编码器、实体识别、生物医学

基准测试

我们在 8 个复杂的真实世界数据集上对模型进行了测试，并与其他 GLiNER 模型进行了比较：

模型	F1 分数	宏平均 F1	宏中位数 F1	加权 F1
大型模型
NuNER Zero	40.87	21.79	13.94	33.67
NuNER Zero span	40.26	22.51	14.27	32.52
GLiNER bio v0.1	42.34	27.10	24.44	38.32
GLiNER bio v0.2	38.66	25.36	17.02	32.42
GLiNER v1.0	47.77	29.60	21.13	40.78
GLiNER v2.0	37.38	21.42	15.44	33.11
GLiNER v2.1	48.04	29.75	28.20	43.43
GLiNER news v2.1	48.99	31.79	33.77	45.13
GLiNER v2.5	53.81	35.22	35.65	51.57
GLiNER-biomed	59.77	40.67	42.65	58.40
GLiNER-biomed-bi	54.90	35.78	31.66	50.46
基础模型
GLiNER v1.0	41.61	24.98	10.27	31.59
GLiNER v2.0	34.33	24.48	22.01	30.58
GLiNER v2.1	40.25	25.26	14.41	32.64
GLiNER news v2.1	41.59	27.16	17.74	34.44
GLiNER v2.5	46.49	30.93	25.26	44.68
GLiNER-biomed	54.37	36.20	41.61	53.05
GLiNER-biomed-bi	58.31	35.22	32.39	54.91
小型模型
GLiNER v1.0	40.99	22.81	7.86	31.15
GLiNER v2.0	33.55	21.12	15.76	28.78
GLiNER v2.1	38.45	23.25	10.92	30.67
GLiNER news v2.1	39.15	24.96	14.48	33.10
GLiNER v2.5	38.21	28.53	18.01	36.88
GLiNER-biomed	52.53	34.49	38.17	50.87
GLiNER-biomed-bi	56.93	33.88	33.61	53.12

加入我们的 Discord

在 Discord 上与我们的社区联系，获取有关我们模型的最新消息、支持和讨论。加入 Discord。

📄 许可证

本项目采用 apache-2.0 许可证。

🔗 引用

本工作

如果在你的工作中使用了 GLiNER-biomed 模型，请引用：

@misc{yazdani2025glinerbiomedsuiteefficientmodels,
      title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
      author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
      year={2025},
      eprint={2504.00676},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.00676},
}

先前工作

@misc{zaratiana2023gliner,
      title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
      author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
      year={2023},
      eprint={2311.08526},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{stepanov2024gliner,
      title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
      author={Ihor Stepanov and Mykhailo Shtopko},
      year={2024},
      eprint={2406.12925},
      archivePrefix={arXiv},
      primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}