Nermemberta 3entities
基于CamemBERTa v2微调的法语命名实体识别模型,支持LOC/PER/ORG三类实体识别
下载量 124
发布时间 : 11/20/2024
模型简介
专用于法语命名实体识别任务的BERT模型,在整合的420,264条法语数据上微调,可识别地点、人物、机构三类实体
模型特点
多数据集整合训练
融合五个法语NER数据集,经清洗后形成统一训练集(346,071条数据)
高效碳排放
训练过程仅产生0.0335 kg CO2当量排放(基于法国电网系数计算)
即用型API
提供Hugging Face pipeline集成和在线演示空间
模型能力
法语命名实体识别
LOC/PER/ORG实体分类
文本标记分类
使用案例
信息提取
新闻实体分析
从法语新闻文本中提取关键实体(如奥运会相关机构、设计师姓名等)
可准确识别如'大雷克斯剧院(LOC)'、'Sylvain Boyer(PER)'等实体
知识图谱构建
实体关系挖掘
作为知识图谱构建的前置处理工具
🚀 NERmemBERTa-3entities
NERmemBERTa-3entities 是一个基于 CamemBERTa v2 base 微调的模型,专门用于法语的命名实体识别(NER)任务。它在五个法语 NER 数据集上进行训练,以识别三种实体类型(LOC、PER、ORG)。
🚀 快速开始
代码示例
from transformers import pipeline
ner = pipeline('token-classification', model='CATIE-AQ/NERmemberta-base-3entities', tokenizer='CATIE-AQ/NERmemberta-base-3entities', aggregation_strategy="simple")
result = ner(
"Le dévoilement du logo officiel des JO s'est déroulé le 21 octobre 2019 au Grand Rex. Ce nouvel emblème et cette nouvelle typographie ont été conçus par le designer Sylvain Boyer avec les agences Royalties & Ecobranding. Rond, il rassemble trois symboles : une médaille d'or, la flamme olympique et Marianne, symbolisée par un visage de femme mais privée de son bonnet phrygien caractéristique. La typographie dessinée fait référence à l'Art déco, mouvement artistique des années 1920, décennie pendant laquelle ont eu lieu pour la dernière fois les Jeux olympiques à Paris en 1924. Pour la première fois, ce logo sera unique pour les Jeux olympiques et les Jeux paralympiques."
)
print(result)
通过 Space 试用
可以通过 这里 的 Space 来测试该模型。
✨ 主要特性
- 多数据集训练:在五个法语 NER 数据集上进行训练,数据总量超过 420,264 行。
- 高准确率:在多个评估指标上表现出色,如 F1 分数。
- 支持三种实体类型:能够识别 LOC(地点)、PER(人物)和 ORG(组织)三种实体类型。
📚 详细文档
模型描述
我们推出的 NERmemBERTa-3entities 是在 CamemBERTa v2 base 基础上进行微调的,用于法语的命名实体识别任务。它在五个法语 NER 数据集上进行训练,针对三种实体(LOC、PER、ORG)。所有这些数据集被合并并清理成一个单一的数据集,我们称之为 frenchNER_3entities。这总共包含超过 420,264 行数据,其中 346,071 行用于训练,32,951 行用于验证,41,242 行用于测试。我们的方法在一篇博客文章中有详细描述,可查看 英文版本 或 法文版本。
评估结果
frenchNER_3entities
由于空间限制,我们仅展示不同模型的 F1 分数。完整结果可在表格下方查看。
模型 | 参数 | 上下文 | PER | LOC | ORG |
---|---|---|---|---|---|
Jean-Baptiste/camembert-ner | 110M | 512 tokens | 0.941 | 0.883 | 0.658 |
cmarkea/distilcamembert-base-ner | 67.5M | 512 tokens | 0.942 | 0.882 | 0.647 |
NERmembert-base-3entities | 110M | 512 tokens | 0.966 | 0.940 | 0.876 |
NERmembert2-3entities | 111M | 1024 tokens | 0.967 | 0.942 | 0.875 |
NERmemberta-3entities (本模型) | 111M | 1024 tokens | 0.970 | 0.943 | 0.881 |
NERmembert-large-3entities | 336M | 512 tokens | 0.969 | 0.947 | 0.890 |
完整结果
| 模型 | 指标 | PER | LOC | ORG | O | 总体 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精确率 | 0.918 | 0.860 | 0.831 | 0.992 | 0.974 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.964 | 0.908 | 0.544 | 0.964 | 0.948 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | 0.941 | 0.883 | 0.658 | 0.978 | 0.961 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精确率 | 0.929 | 0.861 | 0.813 | 0.991 | 0.974 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.956 | 0.905 | 0.956 | 0.965 | 0.948 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.942 | 0.882 | 0.647 | 0.978 | 0.961 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精确率 | 0.961 | 0.935 | 0.877 | 0.995 | 0.986 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.972 | 0.946 | 0.876 | 0.994 | 0.986 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.966 | 0.940 | 0.876 | 0.994 | 0.986 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精确率 | 0.964 | 0.935 | 0.872 | 0.995 | 0.985 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.967 | 0.949 | 0.878 | 0.993 | 0.984 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.967 | 0.942 | 0.875 | 0.994 | 0.985 | | [NERmemberta-3entities (111M) (本模型)](https://hf.co/CATIE-AQ/NERmemberta-3entities) | 精确率 | 0.966 | 0.934 | 0.880 | 0.995 | 0.985 | | [NERmemberta-3entities (111M) (本模型)](https://hf.co/CATIE-AQ/NERmemberta-3entities) | 召回率 | 0.973 | 0.952 | 0.883 | 0.993 | 0.985 | | [NERmemberta-3entities (111M) (本模型)](https://hf.co/CATIE-AQ/NERmemberta-3entities) | F1 | 0.970 | 0.943 | 0.881 | 0.994 | 0.985 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精确率 | 0.946 | 0.884 | 0.859 | 0.993 | 0.971 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.955 | 0.904 | 0.550 | 0.993 | 0.971 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | 0.951 | 0.894 | 0.671 | 0.988 | 0.971 |multiconer
由于空间限制,我们仅展示不同模型的 F1 分数。完整结果可在表格下方查看。
模型 | PER | LOC | ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner (110M) | 0.940 | 0.761 | 0.723 |
cmarkea/distilcamembert-base-ner (67.5M) | 0.921 | 0.748 | 0.694 |
NERmembert-base-3entities (110M) | 0.960 | 0.887 | 0.876 |
NERmembert2-3entities (111M) | 0.958 | 0.876 | 0.863 |
NERmemberta-3entities (本模型) | 0.964 | 0.865 | 0.859 |
NERmembert-large-3entities (336M) | 0.965 | 0.902 | 0.896 |
完整结果
| 模型 | 指标 | PER | LOC | ORG | O | 总体 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精确率 | 0.908 | 0.717 | 0.753 | 0.987 | 0.947 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.975 | 0.811 | 0.696 | 0.878 | 0.880 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | 0.940 | 0.761 | 0.723 | 0.929 | 0.912 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精确率 | 0.885 | 0.738 | 0.737 | 0.983 | 0.943 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.960 | 0.759 | 0.655 | 0.882 | 0.877 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.921 | 0.748 | 0.694 | 0.930 | 0.909 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精确率 | 0.957 | 0.894 | 0.876 | 0.986 | 0.972 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.962 | 0.880 | 0.878 | 0.985 | 0.972 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.960 | 0.887 | 0.876 | 0.985 | 0.972 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精确率 | 0.951 | 0.906 | 0.853 | 0.984 | 0.967 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.966 | 0.848 | 0.874 | 0.984 | 0.967 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.958 | 0.876 | 0.863 | 0.984 | 0.967 | | NERmemberta-3entities (本模型) | 精确率 | 0.962 | 0.859 | 0.862 | 0.985 | 0.970 | | NERmemberta-3entities (本模型) | 召回率 | 0.967 | 0.871 | 0.857 | 0.984 | 0.970 | | NERmemberta-3entities (本模型) | F1 | 0.964 | 0.865 | 0.859 | 0.985 | 0.970 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精确率 | 0.960 | 0.903 | 0.916 | 0.987 | 0.976 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.969 | 0.900 | 0.877 | 0.987 | 0.976 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | 0.965 | 0.902 | 0.896 | 0.987 | 0.976 |multinerd
由于空间限制,我们仅展示不同模型的 F1 分数。完整结果可在表格下方查看。
模型 | PER | LOC | ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner (110M) | 0.962 | 0.934 | 0.888 |
cmarkea/distilcamembert-base-ner (67.5M) | 0.972 | 0.938 | 0.884 |
NERmembert-base-3entities (110M) | 0.985 | 0.973 | 0.938 |
NERmembert2-3entities (111M) | 0.985 | 0.972 | 0.933 |
NERmemberta-3entities (本模型) | 0.986 | 0.974 | 0.945 |
NERmembert-large-3entities (336M) | 0.987 | 0.979 | 0.953 |
完整结果
| 模型 | 指标 | PER | LOC | ORG | O | 总体 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精确率 | 0.931 | 0.893 | 0.827 | 0.999 | 0.988 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.994 | 0.980 | 0.959 | 0.973 | 0.974 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | 0.962 | 0.934 | 0.888 | 0.986 | 0.981 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精确率 | 0.954 | 0.908 | 0.817 | 0.999 | 0.990 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.991 | 0.969 | 0.963 | 0.975 | 0.975 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.972 | 0.938 | 0.884 | 0.987 | 0.983 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精确率 | 0.974 | 0.965 | 0.910 | 0.999 | 0.995 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.995 | 0.981 | 0.968 | 0.996 | 0.995 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.985 | 0.973 | 0.938 | 0.998 | 0.995 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精确率 | 0.975 | 0.960 | 0.902 | 0.999 | 0.995 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.995 | 0.985 | 0.967 | 0.995 | 0.995 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.985 | 0.972 | 0.933 | 0.997 | 0.995 | | NERmemberta-3entities (本模型) | 精确率 | 0.976 | 0.961 | 0.915 | 0.999 | 0.995 | | NERmemberta-3entities (本模型) | 召回率 | 0.997 | 0.987 | 0.976 | 0.996 | 0.995 | | NERmemberta-3entities (本模型) | F1 | 0.986 | 0.974 | 0.945 | 0.997 | 0.995 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精确率 | 0.979 | 0.970 | 0.927 | 0.999 | 0.996 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.996 | 0.987 | 0.980 | 0.997 | 0.996 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | **0.987** | **0.979** | **0.953** | **0.998** | **0.996** |wikiner
由于空间限制,我们仅展示不同模型的 F1 分数。完整结果可在表格下方查看。
模型 | PER | LOC | ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner (110M) | 0.986 | 0.966 | 0.938 |
cmarkea/distilcamembert-base-ner (67.5M) | 0.983 | 0.964 | 0.925 |
NERmembert-base-3entities (110M) | 0.969 | 0.945 | 0.878 |
NERmembert2-3entities (111M) | 0.969 | 0.946 | 0.866 |
NERmemberta-3entities (本模型) | 0.971 | 0.948 | 0.885 |
NERmembert-large-3entities (336M) | 0.972 | 0.950 | 0.893 |
完整结果
| 模型 | 指标 | PER | LOC | ORG | O | 总体 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精确率 | 0.986 | 0.962 | 0.925 | 0.999 | 0.994 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.987 | 0.969 | 0.951 | 0.965 | 0.967 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | **0.986** | **0.966** | **0.938** | **0.982** | **0.980** | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精确率 | 0.982 | 0.951 | 0.910 | 0.998 | 0.994 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.985 | 0.963 | 0.940 | 0.966 | 0.967 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.983 | 0.964 | 0.925 | 0.982 | 0.80 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精确率 | 0.971 | 0.947 | 0.866 | 0.994 | 0.989 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.969 | 0.942 | 0.891 | 0.995 | 0.989 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.969 | 0.945 | 0.878 | 0.995 | 0.989 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精确率 | 0.971 | 0.946 | 0.863 | 0.994 | 0.988 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.967 | 0.946 | 0.870 | 0.995 | 0.988 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.969 | 0.946 | 0.866 | 0.994 | 0.988 | | NERmemberta-3entities (本模型) | 精确率 | 0.972 | 0.946 | 0.865 | 0.995 | 0.987 | | NERmemberta-3entities (本模型) | 召回率 | 0.970 | 0.950 | 0.905 | 0.995 | 0.987 | | NERmemberta-3entities (本模型) | F1 | 0.971 | 0.948 | 0.885 | 0.995 | 0.987 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精确率 | 0.973 | 0.953 | 0.873 | 0.996 | 0.990 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.990 | 0.948 | 0.913 | 0.995 | 0.990 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | 0.972 | 0.950 | 0.893 | 0.996 | 0.990 |wikiann
由于空间限制,我们仅展示不同模型的 F1 分数。完整结果可在表格下方查看。
模型 | PER | LOC | ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner (110M) | 0.867 | 0.722 | 0.451 |
cmarkea/distilcamembert-base-ner (67.5M) | 0.862 | 0.722 | 0.451 |
NERmembert-base-3entities (110M) | 0.947 | 0.906 | 0.886 |
NERmembert2-3entities (111M) | 0.950 | 0.911 | 0.910 |
NERmemberta-3entities (本模型) | 0.953 | 0.902 | 0.890 |
NERmembert-large-3entities (336M) | 0.949 | 0.912 | 0.899 |
完整结果
| 模型 | 指标 | PER | LOC | ORG | O | 总体 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精确率 | 0.862 | 0.700 | 0.864 | 0.867 | 0.832 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.871 | 0.746 | 0.305 | 0.950 | 0.772 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | 0.867 | 0.722 | 0.451 | 0.867 | 0.801 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精确率 | 0.862 | 0.700 | 0.864 | 0.867 | 0.832 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.871 | 0.746 | 0.305 | 0.950 | 0.772 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.867 | 0.722 | 0.451 | 0.907 | 0.800 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精确率 | 0.948 | 0.900 | 0.893 | 0.979 | 0.942 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.946 | 0.911 | 0.878 | 0.982 | 0.942 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.947 | 0.906 | 0.886 | 0.980 | 0.942 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精确率 | 0.962 | 0.906 | 0.890 | 0.971 | 0.941 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.938 | 0.917 | 0.884 | 0.982 | 0.941 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.950 | 0.911 | 0.887 | 0.976 | 0.941 | | NERmemberta-3entities (本模型) | 精确率 | 0.961 | 0.902 | 0.899 | 0.972 | 0.942 | | NERmemberta-3entities (本模型) | 召回率 | 0.946 | 0.918 | 0.881 | 0.982 | 0.942 | | NERmemberta-3entities (本模型) | F1 | 0.953 | 0.902 | 0.890 | 0.977 | 0.942 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精确率 | 0.958 | 0.917 | 0.897 | 0.980 | **0.948** | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.940 | 0.915 | 0.901 | 0.983 | **0.948** | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | **0.949** | **0.912** | **0.899** | **0.983** | **0.948** |🔧 技术细节
环境影响
碳排放量使用 Lacoste 等人(2019) 提出的 机器学习影响计算器 进行估算。硬件、运行时间、云服务提供商和计算区域用于估算碳影响。
- 硬件类型:A100 PCIe 40/80GB
- 使用时长:2 小时 51 分钟
- 云服务提供商:私有基础设施
- 碳效率(kg/kWh):0.047(根据 electricitymaps 估算 2024 年 11 月 20 日的数据)
- 碳排放 (功耗 x 时间 x 基于电网位置产生的碳量):0.0335 kg 等效 CO2
📄 许可证
本项目采用 MIT 许可证。
📚 引用
NERmemBERTa-3entities
@misc {NERmemberta2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { NERmemberta-3entities (Revision 989f2ee) },
year = 2024,
url = { https://huggingface.co/CATIE-AQ/NERmemberta-3entities },
doi = { 10.57967/hf/3640 },
publisher = { Hugging Face }
}
NERmemBERT
@misc {NERmembert2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { NERmembert-base-3entities },
year = 2024,
url = { https://huggingface.co/CATIE-AQ/NERmembert-base-3entities },
doi = { 10.57967/hf/1752 },
publisher = { Hugging Face }
}
CamemBERT
@inproceedings{martin2020camembert,
title={CamemBERT: a Tasty French Language Model},
author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
year={2020}}
CamemBERT 2.0
@misc{antoun2024camembert20smarterfrench,
title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
year={2024},
eprint={2411.08868},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.08868},
}
multiconer
@inproceedings{multiconer2-report,
title={{SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)}},
author={Fetahu, Besnik and Kar, Sudipta and Chen, Zhiyu and Rokhlenko, Oleg and Malmasi, Shervin},
booktitle={Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)},
year={2023},
publisher={Association for Computational Linguistics}}
@article{multiconer2-data,
title={{MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition}},
author={Fetahu, Besnik and Chen, Zhiyu and Kar, Sudipta and Rokhlenko, Oleg and Malmasi, Shervin},
year={2023}}
multinerd
@inproceedings{tedeschi-navigli-2022-multinerd,
title = "{M}ulti{NERD}: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)",
author = "Tedeschi, Simone and Navigli, Roberto",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-naacl.60",
doi = "10.18653/v1/2022.findings-naacl.60",
pages = "801--812"}
pii-masking-200k
@misc {ai4privacy_2023,
author = { {ai4Privacy} },
title = { pii-masking-200k (Revision 1d4c0a1) },
year = 2023,
url = { https://huggingface.co/datasets/ai4privacy/pii-masking-200k },
doi = { 10.57967/hf/1532 },
publisher = { Hugging Face }}
wikiann
@inproceedings{rahimi-etal-2019-massively,
title = "Massively Multilingual Transfer for {NER}",
author = "Rahimi, Afshin and Li, Yuan and Cohn, Trevor",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1015",
pages = "151--164"}
wikiner
@article{NOTHMAN2013151,
title = {Learning multilingual named entity recognition from Wikipedia},
journal = {Artificial Intelligence},
volume = {194},
pages = {151-175},
year = {2013},
note = {Artificial Intelligence, Wikipedia and Semi-Structured Resources},
issn = {0004-3702},
doi = {https://doi.org/10.1016/j.artint.2012.03.006},
url = {https://www.sciencedirect.com/science/article/pii/S0004370212000276},
author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
frenchNER_3entities
@misc {frenchNER2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { frenchNER_3entities },
year = 2024,
url = { https://huggingface.co/CATIE-AQ/frenchNER_3entities },
doi = { 10.57967/hf/1751 },
publisher = { Hugging Face }
}
Indonesian Roberta Base Posp Tagger
MIT
这是一个基于印尼语RoBERTa模型微调的词性标注模型,在indonlu数据集上训练,用于印尼语文本的词性标注任务。
序列标注
Transformers 其他

I
w11wo
2.2M
7
Bert Base NER
MIT
基于BERT微调的命名实体识别模型,可识别四类实体:地点(LOC)、组织机构(ORG)、人名(PER)和杂项(MISC)
序列标注 英语
B
dslim
1.8M
592
Deid Roberta I2b2
MIT
该模型是基于RoBERTa微调的序列标注模型,用于识别和移除医疗记录中的受保护健康信息(PHI/PII)。
序列标注
Transformers 支持多种语言

D
obi
1.1M
33
Ner English Fast
Flair自带的英文快速4类命名实体识别模型,基于Flair嵌入和LSTM-CRF架构,在CoNLL-03数据集上达到92.92的F1分数。
序列标注
PyTorch 英语
N
flair
978.01k
24
French Camembert Postag Model
基于Camembert-base的法语词性标注模型,使用free-french-treebank数据集训练
序列标注
Transformers 法语

F
gilf
950.03k
9
Xlm Roberta Large Ner Spanish
基于XLM-Roberta-large架构微调的西班牙语命名实体识别模型,在CoNLL-2002数据集上表现优异。
序列标注
Transformers 西班牙语

X
MMG
767.35k
29
Nusabert Ner V1.3
MIT
基于NusaBert-v1.3在印尼语NER任务上微调的命名实体识别模型
序列标注
Transformers 其他

N
cahya
759.09k
3
Ner English Large
Flair框架内置的英文4类大型NER模型,基于文档级XLM-R嵌入和FLERT技术,在CoNLL-03数据集上F1分数达94.36。
序列标注
PyTorch 英语
N
flair
749.04k
44
Punctuate All
MIT
基于xlm-roberta-base微调的多语言标点符号预测模型,支持12种欧洲语言的标点符号自动补全
序列标注
Transformers

P
kredor
728.70k
20
Xlm Roberta Ner Japanese
MIT
基于xlm-roberta-base微调的日语命名实体识别模型
序列标注
Transformers 支持多种语言

X
tsmatz
630.71k
25
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98