🚀 GLiNER-BioMed
GLiNER-BioMed 是一套高效的开源生物医学命名实体识别模型,基于 GLiNER 框架开发。它借助从大型生成式生物医学语言模型中提炼的合成注释,在生物医学实体识别任务中实现了零样本和少样本学习的先进性能。
🚀 快速开始
安装GLiNER库
使用pip安装官方GLiNER库:
pip install gliner -U
加载模型并进行命名实体识别
安装GLiNER库后,你可以轻松加载GLiNER-biomed模型并进行命名实体识别:
from gliner import GLiNER
model = GLiNER.from_pretrained("Ihor/gliner-biomed-small-v1.0")
text = """
The patient, a 45-year-old male, was diagnosed with type 2 diabetes mellitus and hypertension.
He was prescribed Metformin 500mg twice daily and Lisinopril 10mg once daily.
A recent lab test showed elevated HbA1c levels at 8.2%.
"""
labels = ["Disease", "Drug", "Drug dosage", "Drug frequency", "Lab test", "Lab test value", "Demographic information"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(entity["text"], "=>", entity["label"])
预期输出
45-year-old male => Demographic information
type 2 diabetes mellitus => Disease
hypertension => Disease
Metformin => Drug
500mg => Drug dosage
twice daily => Drug frequency
Lisinopril => Drug
10mg => Drug dosage
once daily => Drug frequency
HbA1c levels => Lab test
8.2% => Lab test value
✨ 主要特性
- 灵活的实体识别:GLiNER是一个命名实体识别(NER)模型,能够使用双向变压器编码器(类似BERT)识别任何实体类型。它为传统的NER模型提供了实用的替代方案,传统模型仅限于预定义的实体,而大语言模型(LLM)虽然灵活,但在资源受限的场景下成本高且体积大。
- 生物医学领域专用:GLiNER-biomed与日内瓦大学的 DS4DH 合作开发,引入了一套基于GLiNER框架的高效开源生物医学NER模型。
- 先进的性能:GLiNER-biomed利用从大型生成式生物医学语言模型中提炼的合成注释,在生物医学实体识别任务中实现了最先进的零样本和少样本性能。
📚 详细文档
该模型在论文 GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition 中有所介绍。
代码可在 https://github.com/ds4dh/GLiNER-biomed 获取。
🔧 技术细节
模型信息
属性 |
详情 |
基础模型 |
microsoft/deberta-v3-small |
数据集 |
knowledgator/GLINER-multi-task-synthetic-data、knowledgator/biomed_NER |
语言 |
en |
库名称 |
gliner |
许可证 |
apache-2.0 |
评估指标 |
f1 |
任务类型 |
token-classification |
标签 |
NER、GLiNER、信息提取、编码器、实体识别、生物医学 |
基准测试
我们在8个复杂的真实世界数据集上对模型进行了评估,并与其他GLiNER模型进行了比较。
📄 许可证
本项目采用 apache-2.0 许可证。
💬 加入我们的Discord社区
在Discord上与我们的社区联系,获取有关我们模型的最新消息、支持和讨论。加入 https://discord.gg/dkyeAgs9DG。
📖 引用
本工作
如果您在工作中使用了GLiNER-biomed模型,请引用:
@misc{yazdani2025glinerbiomedsuiteefficientmodels,
title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
year={2025},
eprint={2504.00676},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.00676},
}
先前工作
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{stepanov2024gliner,
title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
author={Ihor Stepanov and Mykhailo Shtopko},
year={2024},
eprint={2406.12925},
archivePrefix={arXiv},
primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}