🚀 GLiNER-BioMed
GLiNER-BioMed 是一套专门用于生物医学领域的命名实体识别(NER)模型。它基于双向变压器编码器(类似BERT),能够识别任意实体类型,为传统NER模型和大语言模型(LLM)提供了一种实用的替代方案。传统NER模型局限于预定义实体,而大语言模型虽灵活但在资源受限场景下成本高且规模大。GLiNER-BioMed与日内瓦大学的 DS4DH 合作开发,利用从大型生成式生物医学语言模型中提取的合成注释,在生物医学实体识别任务中实现了最先进的零样本和少样本性能。
该模型的相关论文为 GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition,代码可在 https://github.com/ds4dh/GLiNER-biomed 查看。
🚀 快速开始
安装
使用pip安装官方GLiNER库:
pip install gliner -U
使用
安装GLiNER库后,你可以轻松加载GLiNER-biomed模型并执行命名实体识别:
from gliner import GLiNER
model = GLiNER.from_pretrained("Ihor/gliner-biomed-large-v1.0")
text = """
The patient, a 45-year-old male, was diagnosed with type 2 diabetes mellitus and hypertension.
He was prescribed Metformin 500mg twice daily and Lisinopril 10mg once daily.
A recent lab test showed elevated HbA1c levels at 8.2%.
"""
labels = ["Disease", "Drug", "Drug dosage", "Drug frequency", "Lab test", "Lab test value", "Demographic information"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(entity["text"], "=>", entity["label"])
预期输出
45-year-old male => Demographic information
type 2 diabetes mellitus => Disease
hypertension => Disease
Metformin => Drug
500mg => Drug dosage
twice daily => Drug frequency
Lisinopril => Drug
10mg => Drug dosage
once daily => Drug frequency
HbA1c levels => Lab test
8.2% => Lab test value
✨ 主要特性
- 灵活的实体识别:能够识别任意实体类型,突破了传统NER模型只能识别预定义实体的限制。
- 高效性能:基于GLiNER框架,利用合成注释,在生物医学实体识别任务中实现了零样本和少样本的先进性能。
- 资源友好:相较于大语言模型,在资源受限的场景下更具优势。
📚 详细文档
基准测试
我们在8个复杂的真实数据集上对模型进行了测试,并与其他GLiNER模型进行了比较:
加入我们的Discord社区
在Discord上与我们的社区成员交流,获取模型的最新消息、支持和讨论。点击 Discord 加入。
📄 许可证
本项目采用 Apache-2.0 许可证。
📚 引用
本工作
如果您在工作中使用了GLiNER-biomed模型,请引用以下论文:
@misc{yazdani2025glinerbiomedsuiteefficientmodels,
title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
year={2025},
eprint={2504.00676},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.00676},
}
先前工作
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{stepanov2024gliner,
title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
author={Ihor Stepanov and Mykhailo Shtopko},
year={2024},
eprint={2406.12925},
archivePrefix={arXiv},
primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}