🚀 GLiNER-BioMed
GLiNER-BioMed 是一套專門用於生物醫學領域的命名實體識別(NER)模型。它基於雙向變壓器編碼器(類似BERT),能夠識別任意實體類型,為傳統NER模型和大語言模型(LLM)提供了一種實用的替代方案。傳統NER模型侷限於預定義實體,而大語言模型雖靈活但在資源受限場景下成本高且規模大。GLiNER-BioMed與日內瓦大學的 DS4DH 合作開發,利用從大型生成式生物醫學語言模型中提取的合成註釋,在生物醫學實體識別任務中實現了最先進的零樣本和少樣本性能。
該模型的相關論文為 GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition,代碼可在 https://github.com/ds4dh/GLiNER-biomed 查看。
🚀 快速開始
安裝
使用pip安裝官方GLiNER庫:
pip install gliner -U
使用
安裝GLiNER庫後,你可以輕鬆加載GLiNER-biomed模型並執行命名實體識別:
from gliner import GLiNER
model = GLiNER.from_pretrained("Ihor/gliner-biomed-large-v1.0")
text = """
The patient, a 45-year-old male, was diagnosed with type 2 diabetes mellitus and hypertension.
He was prescribed Metformin 500mg twice daily and Lisinopril 10mg once daily.
A recent lab test showed elevated HbA1c levels at 8.2%.
"""
labels = ["Disease", "Drug", "Drug dosage", "Drug frequency", "Lab test", "Lab test value", "Demographic information"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(entity["text"], "=>", entity["label"])
預期輸出
45-year-old male => Demographic information
type 2 diabetes mellitus => Disease
hypertension => Disease
Metformin => Drug
500mg => Drug dosage
twice daily => Drug frequency
Lisinopril => Drug
10mg => Drug dosage
once daily => Drug frequency
HbA1c levels => Lab test
8.2% => Lab test value
✨ 主要特性
- 靈活的實體識別:能夠識別任意實體類型,突破了傳統NER模型只能識別預定義實體的限制。
- 高效性能:基於GLiNER框架,利用合成註釋,在生物醫學實體識別任務中實現了零樣本和少樣本的先進性能。
- 資源友好:相較於大語言模型,在資源受限的場景下更具優勢。
📚 詳細文檔
基準測試
我們在8個複雜的真實數據集上對模型進行了測試,並與其他GLiNER模型進行了比較:
加入我們的Discord社區
在Discord上與我們的社區成員交流,獲取模型的最新消息、支持和討論。點擊 Discord 加入。
📄 許可證
本項目採用 Apache-2.0 許可證。
📚 引用
本工作
如果您在工作中使用了GLiNER-biomed模型,請引用以下論文:
@misc{yazdani2025glinerbiomedsuiteefficientmodels,
title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
year={2025},
eprint={2504.00676},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.00676},
}
先前工作
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{stepanov2024gliner,
title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
author={Ihor Stepanov and Mykhailo Shtopko},
year={2024},
eprint={2406.12925},
archivePrefix={arXiv},
primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}