🚀 GLiNER-BioMed
GLiNER-BioMed 是一套高效的开放生物医学命名实体识别模型。它基于双向变压器编码器(类似BERT),能够识别任何实体类型,为传统的命名实体识别(NER)模型和大语言模型(LLM)提供了实用的替代方案。传统NER模型局限于预定义实体,而大语言模型虽灵活但在资源受限场景下成本高、体积大。
🚀 快速开始
安装
使用pip安装官方GLiNER库:
pip install gliner -U
使用
安装GLiNER库后,你可以轻松加载GLiNER-biomed模型并执行命名实体识别:
from gliner import GLiNER
model = GLiNER.from_pretrained("Ihor/gliner-biomed-bi-large-v1.0")
text = """
The patient, a 45-year-old male, was diagnosed with type 2 diabetes mellitus and hypertension.
He was prescribed Metformin 500mg twice daily and Lisinopril 10mg once daily.
A recent lab test showed elevated HbA1c levels at 8.2%.
"""
labels = ["Disease", "Drug", "Drug dosage", "Drug frequency", "Lab test", "Lab test value", "Demographic information"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(entity["text"], "=>", entity["label"])
预期输出:
45-year-old male => Demographic information
type 2 diabetes mellitus => Disease
hypertension => Disease
Metformin => Drug
500mg => Drug dosage
twice daily => Drug frequency
Lisinopril => Drug
10mg => Drug dosage
once daily => Drug frequency
HbA1c levels => Lab test
8.2% => Lab test value
如果你有大量实体并想对它们进行预嵌入,请参考以下代码片段:
labels = ["your entities"]
texts = ["your texts"]
entity_embeddings = model.encode_labels(labels, batch_size = 8)
outputs = model.batch_predict_with_embeds(texts, entity_embeddings, labels)
✨ 主要特性
- 灵活性:能够识别任何实体类型,突破了传统NER模型对预定义实体的限制。
- 高效性:相比大语言模型,在资源受限场景下更具优势。
- 高性能:在生物医学实体识别任务中实现了最先进的零样本和少样本性能。
📚 详细文档
该模型在论文 GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition 中被提出。代码可在 https://github.com/ds4dh/GLiNER-biomed 获取。
基准测试
我们在8个复杂的真实世界数据集上对模型进行了测试,并与其他GLiNER模型进行了比较。
📄 许可证
本项目采用Apache 2.0许可证。
🔗 加入我们的Discord
在Discord上与我们的社区联系,获取有关模型的最新消息、支持和讨论。加入 Discord。
📖 引用
本工作
如果您在工作中使用了GLiNER-biomed模型,请引用:
@misc{yazdani2025glinerbiomedsuiteefficientmodels,
title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
year={2025},
eprint={2504.00676},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.00676},
}
先前工作
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{stepanov2024gliner,
title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
author={Ihor Stepanov and Mykhailo Shtopko},
year={2024},
eprint={2406.12925},
archivePrefix={arXiv},
primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}
📋 模型信息
属性 |
详情 |
模型类型 |
命名实体识别(NER)模型 |
训练数据 |
knowledgator/GLINER-multi-task-synthetic-data、knowledgator/biomed_NER |
基础模型 |
microsoft/deberta-v3-base、BAAI/bge-base-en-v1.5 |
评估指标 |
F1 |
标签 |
NER、GLiNER、信息提取、编码器、实体识别、生物医学 |
任务标签 |
标记分类 |