Gliner Small V2.5
模型简介
GLiNER 为传统 NER 模型提供了实用替代方案,后者仅限于预定义实体,而大型语言模型(LLMs)虽然灵活,但在资源受限的场景下成本高昂且体积庞大。
模型特点
灵活实体识别
能够识别任何用户定义的实体类型,而不仅限于预定义实体。
高效性能
相比大型语言模型,在资源受限的场景下更高效且体积更小。
多语言支持
支持多种语言的命名实体识别任务。
模型能力
命名实体识别
多语言文本处理
自定义实体类型识别
使用案例
信息提取
人物信息提取
从文本中识别和提取人物姓名及相关信息。
如识别 'Cristiano Ronaldo dos Santos Aveiro' 为 'person' 实体
奖项信息提取
从文本中识别和提取奖项名称。
如识别 'Ballon d'Or' 为 'award' 实体
日期信息提取
从文本中识别和提取日期信息。
如识别 '5 February 1985' 为 'date' 实体
体育新闻分析
球队信息提取
从体育新闻中识别和提取球队名称。
如识别 'Al Nassr' 和 'Portugal national team' 为 'teams' 实体
比赛信息提取
从体育新闻中识别和提取比赛名称。
如识别 'UEFA Champions Leagues' 和 'UEFA European Championship' 为 'competitions' 实体
🚀 GLiNER - 通用命名实体识别模型
GLiNER是一个命名实体识别(NER)模型,它能够使用双向变压器编码器(类似BERT)识别任何实体类型。该模型为传统的NER模型提供了一个实用的替代方案,传统模型仅限于预定义的实体,而大语言模型(LLMs)虽然灵活,但在资源受限的场景下成本高且规模大。
🚀 快速开始
GLiNER是一个强大的命名实体识别模型,能够识别各种实体类型。下面将为你介绍如何安装和使用该模型。
✨ 主要特性
- 实体类型识别广泛:能够识别任何实体类型,突破了传统NER模型只能识别预定义实体的限制。
- 成本效益高:相较于大语言模型,在资源受限的场景下,具有更好的成本效益。
- 灵活性强:基于双向变压器编码器(类似BERT)构建,具有较强的灵活性。
📦 安装指南
要使用此模型,你必须安装GLiNER Python库:
!pip install gliner -U
💻 使用示例
基础用法
from gliner import GLiNER
model = GLiNER.from_pretrained("gliner-community/gliner_small-v2.5", load_tokenizer=True)
text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""
labels = ["person", "award", "date", "competitions", "teams"]
entities = model.predict_entities(text, labels)
for entity in entities:
print(entity["text"], "=>", entity["label"])
输出示例
Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award
UEFA Champions Leagues => competitions
UEFA European Championship => competitions
UEFA Nations League => competitions
Champions League => competitions
European Championship => competitions
📚 详细文档
命名实体识别基准测试结果
以下是模型先前版本与当前版本的结果对比:
其他数据集上的结果
模型名称 | 数据集 | 精确率 | 召回率 | F1分数 |
---|---|---|---|---|
gliner-community/gliner_small-v2.5 | ACE 2004 | 35.18% | 22.81% | 27.67% |
ACE 2005 | 35.89% | 22.39% | 27.58% | |
AnatEM | 49.12% | 31.31% | 38.24% | |
Broad Tweet Corpus | 59.51% | 77.85% | 67.46% | |
CoNLL 2003 | 63.16% | 70.43% | 66.60% | |
FabNER | 23.78% | 22.55% | 23.15% | |
FindVehicle | 37.46% | 40.06% | 38.72% | |
GENIA_NER | 45.90% | 54.11% | 49.67% | |
HarveyNER | 13.20% | 32.58% | 18.78% | |
MultiNERD | 45.87% | 87.01% | 60.07% | |
Ontonotes | 23.05% | 41.16% | 29.55% | |
PolyglotNER | 31.88% | 67.22% | 43.25% | |
TweetNER7 | 40.98% | 39.91% | 40.44% | |
WikiANN en | 55.35% | 60.06% | 57.61% | |
WikiNeural | 64.52% | 86.24% | 73.81% | |
bc2gm | 51.70% | 49.99% | 50.83% | |
bc4chemd | 30.78% | 57.56% | 40.11% | |
bc5cdr | 63.48% | 69.65% | 66.42% | |
ncbi | 63.36% | 66.67% | 64.97% | |
平均 | 46.58% | |||
------------------------------------ | --------------------- | ----------- | -------- | ---------- |
urchade/gliner_small-v2.1 | ACE 2004 | 38.89% | 23.53% | 29.32% |
ACE 2005 | 42.09% | 26.82% | 32.76% | |
AnatEM | 63.71% | 19.45% | 29.80% | |
Broad Tweet Corpus | 57.01% | 70.49% | 63.04% | |
CoNLL 2003 | 57.11% | 62.66% | 59.76% | |
FabNER | 32.41% | 12.33% | 17.87% | |
FindVehicle | 43.47% | 33.02% | 37.53% | |
GENIA_NER | 61.03% | 37.25% | 46.26% | |
HarveyNER | 23.12% | 15.16% | 18.32% | |
MultiNERD | 43.63% | 83.60% | 57.34% | |
Ontonotes | 23.25% | 35.41% | 28.07% | |
PolyglotNER | 29.47% | 64.41% | 40.44% | |
TweetNER7 | 44.78% | 30.83% | 36.52% | |
WikiANN en | 52.58% | 58.31% | 55.30% | |
WikiNeural | 53.38% | 82.19% | 64.72% | |
bc2gm | 66.64% | 30.56% | 41.90% | |
bc4chemd | 42.01% | 56.03% | 48.02% | |
bc5cdr | 72.03% | 58.58% | 64.61% | |
ncbi | 68.88% | 46.71% | 55.67% | |
平均 | 43.54% | |||
------------------------------------ | --------------------- | ----------- | -------- | ---------- |
EmergentMethods/gliner_small-v2.1 | ACE 2004 | 39.92% | 17.50% | 24.34% |
ACE 2005 | 38.53% | 16.58% | 23.18% | |
AnatEM | 55.95% | 25.69% | 35.22% | |
Broad Tweet Corpus | 66.63% | 72.00% | 69.21% | |
CoNLL 2003 | 62.89% | 58.96% | 60.86% | |
FabNER | 32.76% | 13.33% | 18.95% | |
FindVehicle | 42.93% | 43.20% | 43.06% | |
GENIA_NER | 51.28% | 43.75% | 47.22% | |
HarveyNER | 24.82% | 21.52% | 23.05% | |
MultiNERD | 59.27% | 80.69% | 68.34% | |
Ontonotes | 32.97% | 37.59% | 35.13% | |
PolyglotNER | 33.60% | 63.30% | 43.90% | |
TweetNER7 | 46.90% | 28.66% | 35.58% | |
WikiANN en | 51.91% | 55.43% | 53.61% | |
WikiNeural | 70.65% | 82.21% | 75.99% | |
bc2gm | 49.95% | 43.13% | 46.29% | |
bc4chemd | 35.88% | 71.64% | 47.81% | |
bc5cdr | 68.41% | 68.90% | 68.65% | |
ncbi | 55.31% | 59.87% | 57.50% | |
平均 | 46.20% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
gliner-community/gliner_medium-v2.5 | ACE 2004 | 33.06% | 20.96% | 25.66% |
ACE 2005 | 33.65% | 19.65% | 24.81% | |
AnatEM | 52.03% | 35.28% | 42.05% | |
Broad Tweet Corpus | 60.57% | 79.09% | 68.60% | |
CoNLL 2003 | 63.80% | 68.31% | 65.98% | |
FabNER | 26.20% | 22.26% | 24.07% | |
FindVehicle | 41.95% | 40.68% | 41.30% | |
GENIA_NER | 51.83% | 62.34% | 56.60% | |
HarveyNER | 14.04% | 32.17% | 19.55% | |
MultiNERD | 47.63% | 88.78% | 62.00% | |
Ontonotes | 21.68% | 38.41% | 27.71% | |
PolyglotNER | 32.73% | 68.27% | 44.24% | |
TweetNER7 | 40.39% | 37.64% | 38.97% | |
WikiANN en | 56.41% | 59.90% | 58.10% | |
WikiNeural | 65.61% | 86.28% | 74.54% | |
bc2gm | 55.20% | 56.71% | 55.95% | |
bc4chemd | 35.94% | 63.67% | 45.94% | |
bc5cdr | 63.50% | 70.09% | 66.63% | |
ncbi | 62.96% | 68.55% | 65.63% | |
平均 | 47.81% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
urchade/gliner_medium-v2.1 | ACE 2004 | 36.33% | 22.74% | 27.97% |
ACE 2005 | 40.49% | 25.46% | 31.27% | |
AnatEM | 59.75% | 16.87% | 26.31% | |
Broad Tweet Corpus | 60.89% | 67.25% | 63.91% | |
CoNLL 2003 | 60.62% | 62.39% | 61.50% | |
FabNER | 27.72% | 12.24% | 16.98% | |
FindVehicle | 41.55% | 31.31% | 35.71% | |
GENIA_NER | 60.86% | 43.93% | 51.03% | |
HarveyNER | 23.20% | 23.16% | 23.18% | |
MultiNERD | 41.25% | 83.74% | 55.27% | |
Ontonotes | 20.58% | 34.11% | 25.67% | |
PolyglotNER | 31.32% | 64.22% | 42.11% | |
TweetNER7 | 44.52% | 33.42% | 38.18% | |
WikiANN en | 54.57% | 56.47% | 55.51% | |
WikiNeural | 57.60% | 81.57% | 67.52% | |
bc2gm | 67.98% | 33.45% | 44.84% | |
bc4chemd | 45.66% | 52.00% | 48.62% | |
bc5cdr | 72.20% | 58.12% | 64.40% | |
ncbi | 73.12% | 49.74% | 59.20% | |
平均 | 44.17% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
EmergentMethods/gliner_news_medium-v2.1 | ACE 2004 | 39.21% | 17.24% | 23.95% |
ACE 2005 | 39.82% | 16.48% | 23.31% | |
AnatEM | 57.67% | 23.57% | 33.46% | |
Broad Tweet Corpus | 69.52% | 65.94% | 67.69% | |
CoNLL 2003 | 68.26% | 58.45% | 62.97% | |
FabNER | 30.74% | 15.51% | 20.62% | |
FindVehicle | 40.33% | 37.37% | 38.79% | |
GENIA_NER | 53.70% | 47.73% | 50.54% | |
HarveyNER | 26.29% | 27.05% | 26.67% | |
MultiNERD | 56.78% | 81.96% | 67.08% | |
Ontonotes | 30.90% | 35.86% | 33.19% | |
PolyglotNER | 35.98% | 60.96% | 45.25% | |
TweetNER7 | 52.37% | 28.18% | 38.55% | |
WikiANN en | 53.81% | 52.29% | 53.04% | |
WikiNeural | 76.84% | 78.92% | 77.86% | |
bc2gm | 62.97% | 44.24% | 51.96% | |
bc4chemd | 44.90% | 65.56% | 53.30% | |
bc5cdr | 73.93% | 67.03% | 70.31% | |
ncbi | 69.53% | 60.82% | 64.88% | |
平均 | 47.55% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
gliner-community/gliner_large-v2.5 | ACE 2004 | 31.64% | 22.81% | 26.51% |
ACE 2005 | 32.10% | 22.56% | 26.49% | |
AnatEM | 53.64% | 27.82% | 36.64% | |
Broad Tweet Corpus | 61.93% | 76.85% | 68.59% | |
CoNLL 2003 | 62.83% | 67.71% | 65.18% | |
FabNER | 24.54% | 27.03% | 25.73% | |
FindVehicle | 40.71% | 56.24% | 47.23% | |
GENIA_NER | 43.56% | 52.56% | 47.64% | |
HarveyNER | 14.85% | 27.05% | 19.17% | |
MultiNERD | 38.04% | 89.17% | 53.33% | |
Ontonotes | 17.28% | 40.16% | 24.16% | |
PolyglotNER | 32.88% | 63.31% | 43.28% | |
TweetNER7 | 38.03% | 41.43% | 39.66% | |
WikiANN en | 57.80% | 60.54% | 59.14% | |
WikiNeural | 67.72% | 83.94% | 74.96% | |
bc2gm | 54.74% | 48.54% | 51.45% | |
bc4chemd | 40.20% | 58.66% | 47.71% | |
bc5cdr | 66.27% | 71.95% | 69.00% | |
ncbi | 68.09% | 61.55% | 64.65% | |
平均 | 46.87% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
urchade/gliner_large-v2.1 | ACE 2004 | 37.52% | 25.38% | 30.28% |
ACE 2005 | 39.02% | 29.00% | 33.27% | |
AnatEM | 52.86% | 13.64% | 21.68% | |
Broad Tweet Corpus | 51.44% | 71.73% | 59.91% | |
CoNLL 2003 | 54.86% | 64.98% | 59.49% | |
FabNER | 23.98% | 16.00% | 19.19% | |
FindVehicle | 47.04% | 57.53% | 51.76% | |
GENIA_NER | 58.10% | 49.98% | 53.74% | |
HarveyNER | 16.29% | 21.93% | 18.69% | |
MultiNERD | 34.09% | 85.43% | 48.74% | |
Ontonotes | 14.02% | 32.01% | 19.50% | |
PolyglotNER | 28.53% | 64.92% | 39.64% | |
TweetNER7 | 38.00% | 34.34% | 36.08% | |
WikiANN en | 51.69% | 59.92% | 55.50% | |
WikiNeural | 50.94% | 82.08% | 62.87% | |
bc2gm | 64.48% | 32.47% | 43.19% | |
bc4chemd | 48.66% | 57.52% | 52.72% | |
bc5cdr | 72.19% | 64.27% | 68.00% | |
ncbi | 69.54% | 52.25% | 59.67% | |
平均 | 43.89% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
EmergenMethods/fliner_news_large-v2.1 | ACE 2004 | 43.19% | 18.39% | 25.80% |
ACE 2005 | 45.24% | 21.20% | 28.87% | |
AnatEM | 61.51% | 21.66% | 32.04% | |
Broad Tweet Corpus | 69.38% | 68.99% | 69.18% | |
CoNLL 2003 | 61.47% | 52.18% | 56.45% | |
FabNER | 27.42% | 19.11% | 22.52% | |
FindVehicle | 46.30% | 62.48% | 53.19% | |
GENIA_NER | 54.13% | 54.02% | 54.07% | |
HarveyNER | 15.91% | 15.78% | 15.84% | |
MultiNERD | 53.73% | 79.07% | 63.98% | |
Ontonotes | 26.78% | 39.77% | 32.01% | |
PolyglotNER | 34.28% | 55.87% | 42.49% | |
TweetNER7 | 48.06% | 28.18% | 35.53% | |
WikiANN en | 53.66% | 51.34% | 52.47% | |
WikiNeural | 69.81% | 70.75% | 70.28% | |
bc2gm | 59.83% | 37.62% | 46.20% | |
bc4chemd | 46.24% | 69.15% | 55.42% | |
bc5cdr | 71.94% | 70.37% | 71.15% | |
ncbi | 70.17% | 61.44% | 65.52% | |
平均 | 47.00% | |||
----------------------------------------- | --------------------- | ----------- | -------- | ---------- |
其他可用模型
版本 | 模型名称 | 参数数量 | 语言 | 许可证 |
---|---|---|---|---|
v0 | urchade/gliner_base urchade/gliner_multi |
209M 209M |
英语 多语言 |
cc-by-nc-4.0 |
v1 | urchade/gliner_small-v1 urchade/gliner_medium-v1 urchade/gliner_large-v1 |
166M 209M 459M |
英语 英语 英语 |
cc-by-nc-4.0 |
v2 | urchade/gliner_small-v2 urchade/gliner_medium-v2 urchade/gliner_large-v2 |
166M 209M 459M |
英语 英语 英语 |
apache-2.0 |
v2.1 | urchade/gliner_small-v2.1 urchade/gliner_medium-v2.1 urchade/gliner_large-v2.1 urchade/gliner_multi-v2.1 |
166M 209M 459M 209M |
英语 英语 英语 多语言 |
apache-2.0 |
链接
- 论文:https://arxiv.org/abs/2311.08526
- 代码仓库:https://github.com/urchade/GLiNER
🔧 技术细节
GLiNER使用双向变压器编码器(类似BERT)来识别任何实体类型。它提供了一个实用的替代方案,以解决传统NER模型和大语言模型的局限性。
📄 许可证
本项目采用Apache 2.0许可证。
模型作者
该模型的作者包括:
- Urchade Zaratiana
- Ihor Stepanov
- Nadi Tomeh
- Pierre Holat
- Thierry Charnois
引用
如果您在研究中使用了该模型,请使用以下BibTeX引用:
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Indonesian Roberta Base Posp Tagger
MIT
这是一个基于印尼语RoBERTa模型微调的词性标注模型,在indonlu数据集上训练,用于印尼语文本的词性标注任务。
序列标注
Transformers 其他

I
w11wo
2.2M
7
Bert Base NER
MIT
基于BERT微调的命名实体识别模型,可识别四类实体:地点(LOC)、组织机构(ORG)、人名(PER)和杂项(MISC)
序列标注 英语
B
dslim
1.8M
592
Deid Roberta I2b2
MIT
该模型是基于RoBERTa微调的序列标注模型,用于识别和移除医疗记录中的受保护健康信息(PHI/PII)。
序列标注
Transformers 支持多种语言

D
obi
1.1M
33
Ner English Fast
Flair自带的英文快速4类命名实体识别模型,基于Flair嵌入和LSTM-CRF架构,在CoNLL-03数据集上达到92.92的F1分数。
序列标注
PyTorch 英语
N
flair
978.01k
24
French Camembert Postag Model
基于Camembert-base的法语词性标注模型,使用free-french-treebank数据集训练
序列标注
Transformers 法语

F
gilf
950.03k
9
Xlm Roberta Large Ner Spanish
基于XLM-Roberta-large架构微调的西班牙语命名实体识别模型,在CoNLL-2002数据集上表现优异。
序列标注
Transformers 西班牙语

X
MMG
767.35k
29
Nusabert Ner V1.3
MIT
基于NusaBert-v1.3在印尼语NER任务上微调的命名实体识别模型
序列标注
Transformers 其他

N
cahya
759.09k
3
Ner English Large
Flair框架内置的英文4类大型NER模型,基于文档级XLM-R嵌入和FLERT技术,在CoNLL-03数据集上F1分数达94.36。
序列标注
PyTorch 英语
N
flair
749.04k
44
Punctuate All
MIT
基于xlm-roberta-base微调的多语言标点符号预测模型,支持12种欧洲语言的标点符号自动补全
序列标注
Transformers

P
kredor
728.70k
20
Xlm Roberta Ner Japanese
MIT
基于xlm-roberta-base微调的日语命名实体识别模型
序列标注
Transformers 支持多种语言

X
tsmatz
630.71k
25
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98