gliner-biomed-bi-large-v1.0開源模型 - 免費部署識別多種生物醫學實體

首頁

Gliner Biomed Bi Large V1.0

由Ihor開發

GLiNER-BioMed是基於GLiNER框架的專為生物醫學領域設計的高效開放NER模型套件，能夠識別多種類型的生物醫學實體。

序列標註

PyTorch

英語開源協議:Apache-2.0 #生物醫學NER #零樣本學習 #多實體識別

下載量 56

發布時間 : 2/19/2025

模型概述

該模型通過從大型生成式生物醫學語言模型中提取的合成標註數據，在生物醫學實體識別任務中實現了零樣本和小樣本的先進性能。

模型特點

高效開放

克服了傳統NER模型侷限於預定義實體的缺點，同時避免了大型語言模型成本高、體積大的問題。

生物醫學專用

專為生物醫學領域設計，能夠識別多種生物醫學實體類型。

零樣本和小樣本性能

在生物醫學實體識別任務中實現了零樣本和小樣本的先進性能。

模型能力

生物醫學實體識別

多類型實體識別

零樣本識別

小樣本識別

使用案例

醫療健康

疾病診斷記錄分析

從醫療記錄中識別疾病、藥物、劑量等關鍵信息。

準確識別出2型糖尿病、高血壓等疾病，以及二甲雙胍、賴諾普利等藥物信息。

實驗室檢查結果解析

從實驗室報告中提取檢驗項目和結果值。

成功識別HbA1c水平等檢驗項目和8.2%等結果值。

🚀 GLiNER-BioMed

GLiNER-BioMed 是一套高效的開放生物醫學命名實體識別模型。它基於雙向變壓器編碼器（類似BERT），能夠識別任何實體類型，為傳統的命名實體識別（NER）模型和大語言模型（LLM）提供了實用的替代方案。傳統NER模型侷限於預定義實體，而大語言模型雖靈活但在資源受限場景下成本高、體積大。

🚀 快速開始

安裝

使用pip安裝官方GLiNER庫：

pip install gliner -U

使用

安裝GLiNER庫後，你可以輕鬆加載GLiNER-biomed模型並執行命名實體識別：

from gliner import GLiNER

model = GLiNER.from_pretrained("Ihor/gliner-biomed-bi-large-v1.0")

text = """
The patient, a 45-year-old male, was diagnosed with type 2 diabetes mellitus and hypertension.
He was prescribed Metformin 500mg twice daily and Lisinopril 10mg once daily. 
A recent lab test showed elevated HbA1c levels at 8.2%.
"""

labels = ["Disease", "Drug", "Drug dosage", "Drug frequency", "Lab test", "Lab test value", "Demographic information"]

entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

預期輸出：

45-year-old male => Demographic information
type 2 diabetes mellitus => Disease
hypertension => Disease
Metformin => Drug
500mg => Drug dosage
twice daily => Drug frequency
Lisinopril => Drug
10mg => Drug dosage
once daily => Drug frequency
HbA1c levels => Lab test
8.2% => Lab test value

如果你有大量實體並想對它們進行預嵌入，請參考以下代碼片段：

labels = ["your entities"]
texts = ["your texts"]

entity_embeddings = model.encode_labels(labels, batch_size = 8)

outputs = model.batch_predict_with_embeds(texts, entity_embeddings, labels)

✨ 主要特性

靈活性：能夠識別任何實體類型，突破了傳統NER模型對預定義實體的限制。
高效性：相比大語言模型，在資源受限場景下更具優勢。
高性能：在生物醫學實體識別任務中實現了最先進的零樣本和少樣本性能。

📚 詳細文檔

該模型在論文 GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition 中被提出。代碼可在 https://github.com/ds4dh/GLiNER-biomed 獲取。

基準測試

我們在8個複雜的真實世界數據集上對模型進行了測試，並與其他GLiNER模型進行了比較。

模型	F1分數	宏平均F1	宏中位數F1	加權F1
大型模型
NuNER Zero	40.87	21.79	13.94	33.67
NuNER Zero span	40.26	22.51	14.27	32.52
GLiNER bio v0.1	42.34	27.10	24.44	38.32
GLiNER bio v0.2	38.66	25.36	17.02	32.42
GLiNER v1.0	47.77	29.60	21.13	40.78
GLiNER v2.0	37.38	21.42	15.44	33.11
GLiNER v2.1	48.04	29.75	28.20	43.43
GLiNER news v2.1	48.99	31.79	33.77	45.13
GLiNER v2.5	53.81	35.22	35.65	51.57
GLiNER-biomed	59.77	40.67	42.65	58.40
GLiNER-biomed-bi	54.90	35.78	31.66	50.46
基礎模型
GLiNER v1.0	41.61	24.98	10.27	31.59
GLiNER v2.0	34.33	24.48	22.01	30.58
GLiNER v2.1	40.25	25.26	14.41	32.64
GLiNER news v2.1	41.59	27.16	17.74	34.44
GLiNER v2.5	46.49	30.93	25.26	44.68
GLiNER-biomed	54.37	36.20	41.61	53.05
GLiNER-biomed-bi	58.31	35.22	32.39	54.91
小型模型
GLiNER v1.0	40.99	22.81	7.86	31.15
GLiNER v2.0	33.55	21.12	15.76	28.78
GLiNER v2.1	38.45	23.25	10.92	30.67
GLiNER news v2.1	39.15	24.96	14.48	33.10
GLiNER v2.5	38.21	28.53	18.01	36.88
GLiNER-biomed	52.53	34.49	38.17	50.87
GLiNER-biomed-bi	56.93	33.88	33.61	53.12

📄 許可證

本項目採用Apache 2.0許可證。

🔗 加入我們的Discord

在Discord上與我們的社區聯繫，獲取有關模型的最新消息、支持和討論。加入 Discord。

📖 引用

本工作

如果您在工作中使用了GLiNER-biomed模型，請引用：

@misc{yazdani2025glinerbiomedsuiteefficientmodels,
      title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
      author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
      year={2025},
      eprint={2504.00676},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.00676},
}

先前工作

@misc{zaratiana2023gliner,
      title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
      author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
      year={2023},
      eprint={2311.08526},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{stepanov2024gliner,
      title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
      author={Ihor Stepanov and Mykhailo Shtopko},
      year={2024},
      eprint={2406.12925},
      archivePrefix={arXiv},
      primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}

📋 模型信息

屬性	詳情
模型類型	命名實體識別（NER）模型
訓練數據	knowledgator/GLINER-multi-task-synthetic-data、knowledgator/biomed_NER
基礎模型	microsoft/deberta-v3-base、BAAI/bge-base-en-v1.5
評估指標	F1
標籤	NER、GLiNER、信息提取、編碼器、實體識別、生物醫學
任務標籤	標記分類