🚀 GLiNER-BioMed
GLiNER-BioMed 是一套高效的開放生物醫學命名實體識別模型。它基於雙向變壓器編碼器(類似BERT),能夠識別任何實體類型,為傳統的命名實體識別(NER)模型和大語言模型(LLM)提供了實用的替代方案。傳統NER模型侷限於預定義實體,而大語言模型雖靈活但在資源受限場景下成本高、體積大。
🚀 快速開始
安裝
使用pip安裝官方GLiNER庫:
pip install gliner -U
使用
安裝GLiNER庫後,你可以輕鬆加載GLiNER-biomed模型並執行命名實體識別:
from gliner import GLiNER
model = GLiNER.from_pretrained("Ihor/gliner-biomed-bi-large-v1.0")
text = """
The patient, a 45-year-old male, was diagnosed with type 2 diabetes mellitus and hypertension.
He was prescribed Metformin 500mg twice daily and Lisinopril 10mg once daily.
A recent lab test showed elevated HbA1c levels at 8.2%.
"""
labels = ["Disease", "Drug", "Drug dosage", "Drug frequency", "Lab test", "Lab test value", "Demographic information"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(entity["text"], "=>", entity["label"])
預期輸出:
45-year-old male => Demographic information
type 2 diabetes mellitus => Disease
hypertension => Disease
Metformin => Drug
500mg => Drug dosage
twice daily => Drug frequency
Lisinopril => Drug
10mg => Drug dosage
once daily => Drug frequency
HbA1c levels => Lab test
8.2% => Lab test value
如果你有大量實體並想對它們進行預嵌入,請參考以下代碼片段:
labels = ["your entities"]
texts = ["your texts"]
entity_embeddings = model.encode_labels(labels, batch_size = 8)
outputs = model.batch_predict_with_embeds(texts, entity_embeddings, labels)
✨ 主要特性
- 靈活性:能夠識別任何實體類型,突破了傳統NER模型對預定義實體的限制。
- 高效性:相比大語言模型,在資源受限場景下更具優勢。
- 高性能:在生物醫學實體識別任務中實現了最先進的零樣本和少樣本性能。
📚 詳細文檔
該模型在論文 GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition 中被提出。代碼可在 https://github.com/ds4dh/GLiNER-biomed 獲取。
基準測試
我們在8個複雜的真實世界數據集上對模型進行了測試,並與其他GLiNER模型進行了比較。
📄 許可證
本項目採用Apache 2.0許可證。
🔗 加入我們的Discord
在Discord上與我們的社區聯繫,獲取有關模型的最新消息、支持和討論。加入 Discord。
📖 引用
本工作
如果您在工作中使用了GLiNER-biomed模型,請引用:
@misc{yazdani2025glinerbiomedsuiteefficientmodels,
title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
year={2025},
eprint={2504.00676},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.00676},
}
先前工作
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{stepanov2024gliner,
title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
author={Ihor Stepanov and Mykhailo Shtopko},
year={2024},
eprint={2406.12925},
archivePrefix={arXiv},
primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}
📋 模型信息
屬性 |
詳情 |
模型類型 |
命名實體識別(NER)模型 |
訓練數據 |
knowledgator/GLINER-multi-task-synthetic-data、knowledgator/biomed_NER |
基礎模型 |
microsoft/deberta-v3-base、BAAI/bge-base-en-v1.5 |
評估指標 |
F1 |
標籤 |
NER、GLiNER、信息提取、編碼器、實體識別、生物醫學 |
任務標籤 |
標記分類 |