模型简介
模型特点
模型能力
使用案例
🚀 xomad/gliner-model-merge-large-v1.0模型
xomad/gliner-model-merge-large-v1.0
模型基于预训练模型 knowledgator/gliner-multitask-large-v0.5
开发,旨在探索模型融合技术的能力。通过该技术,模型性能显著提升了3.25个百分点,F1分数从0.6276提高到了0.6601。
该模型仅在具有商业友好许可的数据集上进行训练,以确保在Apache - 2.0许可下具有广泛的适用性。训练过程中使用了以下数据集:
- knowledgator/GLINER-multi-task-synthetic-data
- EmergentMethods/AskNews-NER-v0
- urchade/pile-mistral-v0.1
- MultiCoNER/multiconer_v2
- DFKI-SLT/few-nerd
🚀 快速开始
本模型基于预训练模型 knowledgator/gliner-multitask-large-v0.5
开发,通过模型融合技术显著提升了性能。下面将介绍如何安装和使用该模型。
✨ 主要特性
- 性能提升:通过模型融合技术,F1分数显著提高,从0.6276提升到0.6601。
- 商业友好:仅在具有商业友好许可的数据集上训练,适用于Apache - 2.0许可。
- 多数据集训练:使用多个公开数据集进行训练,包括
knowledgator/GLINER-multi-task-synthetic-data
、EmergentMethods/AskNews-NER-v0
等。
📦 安装指南
要使用此模型,你必须安装 GLiNER Python库:
pip install gliner
下载GLiNER库后,你可以导入GLiNER类,然后使用 GLiNER.from_pretrained
加载此模型。
💻 使用示例
基础用法
from gliner import GLiNER
model = GLiNER.from_pretrained("xomad/gliner-model-merge-large-v1.0")
text = """
Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975 to develop and sell BASIC interpreters for the Altair 8800. During his career at Microsoft, Gates held the positions of chairman, chief executive officer, president and chief software architect, while also being the largest individual shareholder until May 2014.
"""
labels = ["founder", "computer", "software", "position", "date", "company"]
entities = model.predict_entities(text, labels)
for entity in entities:
print(entity["text"], "=>", entity["label"])
输出:
Microsoft => company
Bill Gates => founder
Paul Allen => founder
April 4, 1975 => date
BASIC => software
Altair 8800 => computer
Microsoft => company
chairman => position
chief executive officer => position
president => position
chief software architect => position
May 2014 => date
📚 详细文档
⚙️ 微调过程
该过程从基础模型 knowledgator/gliner-multitask-large-v0.5
开始。我们的模型 xomad/gliner-model-merge-large-v1.0
在上述每个数据集上分别进行微调,并在微调过程中保存多个检查点。我们将所有这些检查点放入一个池中,然后应用 Model soups 技术来生成不同的融合模型:
uniform_merged
greedy_on_random
greedy_on_sorted
随后,我们对从上述3个模型和原始模型中选出的模型对应用 WiSE - FT 融合技术,生成 wise_ft_merged
模型。这结束了第一阶段的微调。
在第二阶段的微调中,以 wise_ft_merged
作为新的起点重复该过程,以生成最终模型。整个微调流程如下图所示:
微调模型池和融合模型的性能在 CrossNER
、TwitterNER基准测试中进行评估,并绘制在以下两个图中(分别为 crossner_f1
和 other_f1
)。
第一阶段微调图:
第二阶段微调图:
📊 基准测试
不同零样本NER基准测试(CrossNER、mit - movie和mit - restaurant)的性能,数据来源于 https://huggingface.co/knowledgator/gliner-multitask-large-v0.5:
模型 | F1分数 |
---|---|
xomad/gliner-model-merge-large-v1.0 | 0.6601 |
knowledgator/gliner-multitask-v0.5 | 0.6276 |
numind/NuNER_Zero-span | 0.6196 |
gliner-community/gliner_large-v2.5 | 0.615 |
EmergentMethods/gliner_large_news-v2.1 | 0.5876 |
urchade/gliner_large-v2.1 | 0.5754 |
不同数据集上的详细性能:
模型 | 数据集 | 精确率 | 召回率 | F1分数 | F1分数(小数) |
---|---|---|---|---|---|
xomad/gliner-model-merge-large-v1.0 | CrossNER_AI | 62.66% | 57.48% | 59.96% | 0.5996 |
CrossNER_literature | 73.28% | 66.42% | 69.68% | 0.6968 | |
CrossNER_music | 74.89% | 70.67% | 72.72% | 0.7272 | |
CrossNER_politics | 79.46% | 77.57% | 78.51% | 0.7851 | |
CrossNER_science | 74.72% | 70.24% | 72.41% | 0.7241 | |
mit-movie | 67.33% | 57.89% | 62.25% | 0.6225 | |
mit-restaurant | 54.94% | 40.41% | 46.57% | 0.4657 | |
平均 | 0.6601 | ||||
numind/NuNER_Zero-span | CrossNER_AI | 63.82% | 56.82% | 60.12% | 0.6012 |
CrossNER_literature | 73.53% | 58.06% | 64.89% | 0.6489 | |
CrossNER_music | 72.69% | 67.40% | 69.95% | 0.6995 | |
CrossNER_politics | 77.28% | 68.69% | 72.73% | 0.7273 | |
CrossNER_science | 70.08% | 63.12% | 66.42% | 0.6642 | |
mit-movie | 63.00% | 48.88% | 55.05% | 0.5505 | |
mit-restaurant | 54.81% | 37.62% | 44.62% | 0.4462 | |
平均 | 0.6196 | ||||
knowledgator/gliner-multitask-v0.5 | CrossNER_AI | 51.00% | 51.11% | 51.05% | 0.5105 |
CrossNER_literature | 72.65% | 65.62% | 68.96% | 0.6896 | |
CrossNER_music | 74.91% | 73.70% | 74.30% | 0.7430 | |
CrossNER_politics | 78.84% | 77.71% | 78.27% | 0.7827 | |
CrossNER_science | 69.20% | 65.48% | 67.29% | 0.6729 | |
mit-movie | 61.29% | 52.59% | 56.60% | 0.5660 | |
mit-restaurant | 50.65% | 38.13% | 43.51% | 0.4351 | |
平均 | 0.6276 | ||||
gliner-community/gliner_large-v2.5 | CrossNER_AI | 50.85% | 63.03% | 56.29% | 0.5629 |
CrossNER_literature | 64.92% | 67.21% | 66.04% | 0.6604 | |
CrossNER_music | 70.88% | 73.10% | 71.97% | 0.7197 | |
CrossNER_politics | 72.67% | 72.93% | 72.80% | 0.7280 | |
CrossNER_science | 61.71% | 68.85% | 65.08% | 0.6508 | |
mit-movie | 54.63% | 52.83% | 53.71% | 0.5371 | |
mit-restaurant | 47.99% | 42.13% | 44.87% | 0.4487 | |
平均 | 0.6154 | ||||
urchade/gliner_large-v2.1 | CrossNER_AI | 54.98% | 52.00% | 53.45% | 0.5345 |
CrossNER_literature | 59.33% | 56.47% | 57.87% | 0.5787 | |
CrossNER_music | 67.39% | 66.77% | 67.08% | 0.6708 | |
CrossNER_politics | 66.07% | 63.76% | 64.90% | 0.6490 | |
CrossNER_science | 61.45% | 62.56% | 62.00% | 0.6200 | |
mit-movie | 55.94% | 47.36% | 51.29% | 0.5129 | |
mit-restaurant | 53.34% | 40.83% | 46.25% | 0.4625 | |
平均 | 0.5754 | ||||
EmergentMethods/gliner_large_news-v2.1 | CrossNER_AI | 59.60% | 54.55% | 56.96% | 0.5696 |
CrossNER_literature | 65.41% | 56.16% | 60.44% | 0.6044 | |
CrossNER_music | 67.47% | 63.08% | 65.20% | 0.6520 | |
CrossNER_politics | 66.05% | 60.07% | 62.92% | 0.6292 | |
CrossNER_science | 68.44% | 63.57% | 65.92% | 0.6592 | |
mit-movie | 65.85% | 49.59% | 56.57% | 0.5657 | |
mit-restaurant | 54.71% | 35.94% | 43.38% | 0.4338 | |
平均 | 0.5876 |
作者
Hoan Nguyen,来自xomad.com
引用
@misc{wortsman2022modelsoupsaveragingweights,
title={Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time},
author={Mitchell Wortsman and Gabriel Ilharco and Samir Yitzhak Gadre and Rebecca Roelofs and Raphael Gontijo-Lopes and Ari S. Morcos and Hongseok Namkoong and Ali Farhadi and Yair Carmon and Simon Kornblith and Ludwig Schmidt},
year={2022},
eprint={2203.05482},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2203.05482},
}
@InProceedings{Wortsman_2022_CVPR,
author = {Wortsman, Mitchell and Ilharco, Gabriel and Kim, Jong Wook and Li, Mike and Kornblith, Simon and Roelofs, Rebecca and Lopes, Raphael Gontijo and Hajishirzi, Hannaneh and Farhadi, Ali and Namkoong, Hongseok and Schmidt, Ludwig},
title = {Robust Fine-Tuning of Zero-Shot Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {7959-7971}
}
@misc{stepanov2024gliner,
title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
author={Ihor Stepanov and Mykhailo Shtopko},
year={2024},
eprint={2406.12925},
archivePrefix={arXiv},
primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
📄 许可证
本模型采用Apache - 2.0许可证。








