Nermemberta 3entities
基於CamemBERTa v2微調的法語命名實體識別模型,支持LOC/PER/ORG三類實體識別
下載量 124
發布時間 : 11/20/2024
模型概述
專用於法語命名實體識別任務的BERT模型,在整合的420,264條法語數據上微調,可識別地點、人物、機構三類實體
模型特點
多數據集整合訓練
融合五個法語NER數據集,經清洗後形成統一訓練集(346,071條數據)
高效碳排放
訓練過程僅產生0.0335 kg CO2當量排放(基於法國電網係數計算)
即用型API
提供Hugging Face pipeline集成和在線演示空間
模型能力
法語命名實體識別
LOC/PER/ORG實體分類
文本標記分類
使用案例
信息提取
新聞實體分析
從法語新聞文本中提取關鍵實體(如奧運會相關機構、設計師姓名等)
可準確識別如'大雷克斯劇院(LOC)'、'Sylvain Boyer(PER)'等實體
知識圖譜構建
實體關係挖掘
作為知識圖譜構建的前置處理工具
🚀 NERmemBERTa-3entities
NERmemBERTa-3entities 是一個基於 CamemBERTa v2 base 微調的模型,專門用於法語的命名實體識別(NER)任務。它在五個法語 NER 數據集上進行訓練,以識別三種實體類型(LOC、PER、ORG)。
🚀 快速開始
代碼示例
from transformers import pipeline
ner = pipeline('token-classification', model='CATIE-AQ/NERmemberta-base-3entities', tokenizer='CATIE-AQ/NERmemberta-base-3entities', aggregation_strategy="simple")
result = ner(
"Le dévoilement du logo officiel des JO s'est déroulé le 21 octobre 2019 au Grand Rex. Ce nouvel emblème et cette nouvelle typographie ont été conçus par le designer Sylvain Boyer avec les agences Royalties & Ecobranding. Rond, il rassemble trois symboles : une médaille d'or, la flamme olympique et Marianne, symbolisée par un visage de femme mais privée de son bonnet phrygien caractéristique. La typographie dessinée fait référence à l'Art déco, mouvement artistique des années 1920, décennie pendant laquelle ont eu lieu pour la dernière fois les Jeux olympiques à Paris en 1924. Pour la première fois, ce logo sera unique pour les Jeux olympiques et les Jeux paralympiques."
)
print(result)
通過 Space 試用
可以通過 這裡 的 Space 來測試該模型。
✨ 主要特性
- 多數據集訓練:在五個法語 NER 數據集上進行訓練,數據總量超過 420,264 行。
- 高準確率:在多個評估指標上表現出色,如 F1 分數。
- 支持三種實體類型:能夠識別 LOC(地點)、PER(人物)和 ORG(組織)三種實體類型。
📚 詳細文檔
模型描述
我們推出的 NERmemBERTa-3entities 是在 CamemBERTa v2 base 基礎上進行微調的,用於法語的命名實體識別任務。它在五個法語 NER 數據集上進行訓練,針對三種實體(LOC、PER、ORG)。所有這些數據集被合併並清理成一個單一的數據集,我們稱之為 frenchNER_3entities。這總共包含超過 420,264 行數據,其中 346,071 行用於訓練,32,951 行用於驗證,41,242 行用於測試。我們的方法在一篇博客文章中有詳細描述,可查看 英文版本 或 法文版本。
評估結果
frenchNER_3entities
由於空間限制,我們僅展示不同模型的 F1 分數。完整結果可在表格下方查看。
模型 | 參數 | 上下文 | PER | LOC | ORG |
---|---|---|---|---|---|
Jean-Baptiste/camembert-ner | 110M | 512 tokens | 0.941 | 0.883 | 0.658 |
cmarkea/distilcamembert-base-ner | 67.5M | 512 tokens | 0.942 | 0.882 | 0.647 |
NERmembert-base-3entities | 110M | 512 tokens | 0.966 | 0.940 | 0.876 |
NERmembert2-3entities | 111M | 1024 tokens | 0.967 | 0.942 | 0.875 |
NERmemberta-3entities (本模型) | 111M | 1024 tokens | 0.970 | 0.943 | 0.881 |
NERmembert-large-3entities | 336M | 512 tokens | 0.969 | 0.947 | 0.890 |
完整結果
| 模型 | 指標 | PER | LOC | ORG | O | 總體 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精確率 | 0.918 | 0.860 | 0.831 | 0.992 | 0.974 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.964 | 0.908 | 0.544 | 0.964 | 0.948 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | 0.941 | 0.883 | 0.658 | 0.978 | 0.961 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精確率 | 0.929 | 0.861 | 0.813 | 0.991 | 0.974 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.956 | 0.905 | 0.956 | 0.965 | 0.948 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.942 | 0.882 | 0.647 | 0.978 | 0.961 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精確率 | 0.961 | 0.935 | 0.877 | 0.995 | 0.986 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.972 | 0.946 | 0.876 | 0.994 | 0.986 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.966 | 0.940 | 0.876 | 0.994 | 0.986 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精確率 | 0.964 | 0.935 | 0.872 | 0.995 | 0.985 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.967 | 0.949 | 0.878 | 0.993 | 0.984 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.967 | 0.942 | 0.875 | 0.994 | 0.985 | | [NERmemberta-3entities (111M) (本模型)](https://hf.co/CATIE-AQ/NERmemberta-3entities) | 精確率 | 0.966 | 0.934 | 0.880 | 0.995 | 0.985 | | [NERmemberta-3entities (111M) (本模型)](https://hf.co/CATIE-AQ/NERmemberta-3entities) | 召回率 | 0.973 | 0.952 | 0.883 | 0.993 | 0.985 | | [NERmemberta-3entities (111M) (本模型)](https://hf.co/CATIE-AQ/NERmemberta-3entities) | F1 | 0.970 | 0.943 | 0.881 | 0.994 | 0.985 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精確率 | 0.946 | 0.884 | 0.859 | 0.993 | 0.971 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.955 | 0.904 | 0.550 | 0.993 | 0.971 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | 0.951 | 0.894 | 0.671 | 0.988 | 0.971 |multiconer
由於空間限制,我們僅展示不同模型的 F1 分數。完整結果可在表格下方查看。
模型 | PER | LOC | ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner (110M) | 0.940 | 0.761 | 0.723 |
cmarkea/distilcamembert-base-ner (67.5M) | 0.921 | 0.748 | 0.694 |
NERmembert-base-3entities (110M) | 0.960 | 0.887 | 0.876 |
NERmembert2-3entities (111M) | 0.958 | 0.876 | 0.863 |
NERmemberta-3entities (本模型) | 0.964 | 0.865 | 0.859 |
NERmembert-large-3entities (336M) | 0.965 | 0.902 | 0.896 |
完整結果
| 模型 | 指標 | PER | LOC | ORG | O | 總體 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精確率 | 0.908 | 0.717 | 0.753 | 0.987 | 0.947 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.975 | 0.811 | 0.696 | 0.878 | 0.880 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | 0.940 | 0.761 | 0.723 | 0.929 | 0.912 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精確率 | 0.885 | 0.738 | 0.737 | 0.983 | 0.943 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.960 | 0.759 | 0.655 | 0.882 | 0.877 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.921 | 0.748 | 0.694 | 0.930 | 0.909 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精確率 | 0.957 | 0.894 | 0.876 | 0.986 | 0.972 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.962 | 0.880 | 0.878 | 0.985 | 0.972 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.960 | 0.887 | 0.876 | 0.985 | 0.972 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精確率 | 0.951 | 0.906 | 0.853 | 0.984 | 0.967 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.966 | 0.848 | 0.874 | 0.984 | 0.967 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.958 | 0.876 | 0.863 | 0.984 | 0.967 | | NERmemberta-3entities (本模型) | 精確率 | 0.962 | 0.859 | 0.862 | 0.985 | 0.970 | | NERmemberta-3entities (本模型) | 召回率 | 0.967 | 0.871 | 0.857 | 0.984 | 0.970 | | NERmemberta-3entities (本模型) | F1 | 0.964 | 0.865 | 0.859 | 0.985 | 0.970 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精確率 | 0.960 | 0.903 | 0.916 | 0.987 | 0.976 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.969 | 0.900 | 0.877 | 0.987 | 0.976 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | 0.965 | 0.902 | 0.896 | 0.987 | 0.976 |multinerd
由於空間限制,我們僅展示不同模型的 F1 分數。完整結果可在表格下方查看。
模型 | PER | LOC | ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner (110M) | 0.962 | 0.934 | 0.888 |
cmarkea/distilcamembert-base-ner (67.5M) | 0.972 | 0.938 | 0.884 |
NERmembert-base-3entities (110M) | 0.985 | 0.973 | 0.938 |
NERmembert2-3entities (111M) | 0.985 | 0.972 | 0.933 |
NERmemberta-3entities (本模型) | 0.986 | 0.974 | 0.945 |
NERmembert-large-3entities (336M) | 0.987 | 0.979 | 0.953 |
完整結果
| 模型 | 指標 | PER | LOC | ORG | O | 總體 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精確率 | 0.931 | 0.893 | 0.827 | 0.999 | 0.988 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.994 | 0.980 | 0.959 | 0.973 | 0.974 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | 0.962 | 0.934 | 0.888 | 0.986 | 0.981 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精確率 | 0.954 | 0.908 | 0.817 | 0.999 | 0.990 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.991 | 0.969 | 0.963 | 0.975 | 0.975 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.972 | 0.938 | 0.884 | 0.987 | 0.983 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精確率 | 0.974 | 0.965 | 0.910 | 0.999 | 0.995 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.995 | 0.981 | 0.968 | 0.996 | 0.995 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.985 | 0.973 | 0.938 | 0.998 | 0.995 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精確率 | 0.975 | 0.960 | 0.902 | 0.999 | 0.995 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.995 | 0.985 | 0.967 | 0.995 | 0.995 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.985 | 0.972 | 0.933 | 0.997 | 0.995 | | NERmemberta-3entities (本模型) | 精確率 | 0.976 | 0.961 | 0.915 | 0.999 | 0.995 | | NERmemberta-3entities (本模型) | 召回率 | 0.997 | 0.987 | 0.976 | 0.996 | 0.995 | | NERmemberta-3entities (本模型) | F1 | 0.986 | 0.974 | 0.945 | 0.997 | 0.995 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精確率 | 0.979 | 0.970 | 0.927 | 0.999 | 0.996 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.996 | 0.987 | 0.980 | 0.997 | 0.996 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | **0.987** | **0.979** | **0.953** | **0.998** | **0.996** |wikiner
由於空間限制,我們僅展示不同模型的 F1 分數。完整結果可在表格下方查看。
模型 | PER | LOC | ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner (110M) | 0.986 | 0.966 | 0.938 |
cmarkea/distilcamembert-base-ner (67.5M) | 0.983 | 0.964 | 0.925 |
NERmembert-base-3entities (110M) | 0.969 | 0.945 | 0.878 |
NERmembert2-3entities (111M) | 0.969 | 0.946 | 0.866 |
NERmemberta-3entities (本模型) | 0.971 | 0.948 | 0.885 |
NERmembert-large-3entities (336M) | 0.972 | 0.950 | 0.893 |
完整結果
| 模型 | 指標 | PER | LOC | ORG | O | 總體 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精確率 | 0.986 | 0.962 | 0.925 | 0.999 | 0.994 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.987 | 0.969 | 0.951 | 0.965 | 0.967 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | **0.986** | **0.966** | **0.938** | **0.982** | **0.980** | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精確率 | 0.982 | 0.951 | 0.910 | 0.998 | 0.994 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.985 | 0.963 | 0.940 | 0.966 | 0.967 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.983 | 0.964 | 0.925 | 0.982 | 0.80 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精確率 | 0.971 | 0.947 | 0.866 | 0.994 | 0.989 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.969 | 0.942 | 0.891 | 0.995 | 0.989 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.969 | 0.945 | 0.878 | 0.995 | 0.989 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精確率 | 0.971 | 0.946 | 0.863 | 0.994 | 0.988 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.967 | 0.946 | 0.870 | 0.995 | 0.988 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.969 | 0.946 | 0.866 | 0.994 | 0.988 | | NERmemberta-3entities (本模型) | 精確率 | 0.972 | 0.946 | 0.865 | 0.995 | 0.987 | | NERmemberta-3entities (本模型) | 召回率 | 0.970 | 0.950 | 0.905 | 0.995 | 0.987 | | NERmemberta-3entities (本模型) | F1 | 0.971 | 0.948 | 0.885 | 0.995 | 0.987 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精確率 | 0.973 | 0.953 | 0.873 | 0.996 | 0.990 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.990 | 0.948 | 0.913 | 0.995 | 0.990 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | 0.972 | 0.950 | 0.893 | 0.996 | 0.990 |wikiann
由於空間限制,我們僅展示不同模型的 F1 分數。完整結果可在表格下方查看。
模型 | PER | LOC | ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner (110M) | 0.867 | 0.722 | 0.451 |
cmarkea/distilcamembert-base-ner (67.5M) | 0.862 | 0.722 | 0.451 |
NERmembert-base-3entities (110M) | 0.947 | 0.906 | 0.886 |
NERmembert2-3entities (111M) | 0.950 | 0.911 | 0.910 |
NERmemberta-3entities (本模型) | 0.953 | 0.902 | 0.890 |
NERmembert-large-3entities (336M) | 0.949 | 0.912 | 0.899 |
完整結果
| 模型 | 指標 | PER | LOC | ORG | O | 總體 | |------|------|------|------|------|------|------| | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 精確率 | 0.862 | 0.700 | 0.864 | 0.867 | 0.832 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | 召回率 | 0.871 | 0.746 | 0.305 | 0.950 | 0.772 | | [Jean-Baptiste/camembert-ner (110M)](https://hf.co/Jean-Baptiste/camembert-ner) | F1 | 0.867 | 0.722 | 0.451 | 0.867 | 0.801 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 精確率 | 0.862 | 0.700 | 0.864 | 0.867 | 0.832 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | 召回率 | 0.871 | 0.746 | 0.305 | 0.950 | 0.772 | | [cmarkea/distilcamembert-base-ner (67.5M)](https://hf.co/cmarkea/distilcamembert-base-ner) | F1 | 0.867 | 0.722 | 0.451 | 0.907 | 0.800 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 精確率 | 0.948 | 0.900 | 0.893 | 0.979 | 0.942 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | 召回率 | 0.946 | 0.911 | 0.878 | 0.982 | 0.942 | | [NERmembert-base-3entities (110M)](https://hf.co/CATIE-AQ/NERmembert-base-3entities) | F1 | 0.947 | 0.906 | 0.886 | 0.980 | 0.942 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 精確率 | 0.962 | 0.906 | 0.890 | 0.971 | 0.941 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | 召回率 | 0.938 | 0.917 | 0.884 | 0.982 | 0.941 | | [NERmembert2-3entities (111M)](https://hf.co/CATIE-AQ/NERmembert2-3entities) | F1 | 0.950 | 0.911 | 0.887 | 0.976 | 0.941 | | NERmemberta-3entities (本模型) | 精確率 | 0.961 | 0.902 | 0.899 | 0.972 | 0.942 | | NERmemberta-3entities (本模型) | 召回率 | 0.946 | 0.918 | 0.881 | 0.982 | 0.942 | | NERmemberta-3entities (本模型) | F1 | 0.953 | 0.902 | 0.890 | 0.977 | 0.942 | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 精確率 | 0.958 | 0.917 | 0.897 | 0.980 | **0.948** | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | 召回率 | 0.940 | 0.915 | 0.901 | 0.983 | **0.948** | | [NERmembert-large-3entities (336M)](https://hf.co/CATIE-AQ/NERmembert-large-3entities) | F1 | **0.949** | **0.912** | **0.899** | **0.983** | **0.948** |🔧 技術細節
環境影響
碳排放量使用 Lacoste 等人(2019) 提出的 機器學習影響計算器 進行估算。硬件、運行時間、雲服務提供商和計算區域用於估算碳影響。
- 硬件類型:A100 PCIe 40/80GB
- 使用時長:2 小時 51 分鐘
- 雲服務提供商:私有基礎設施
- 碳效率(kg/kWh):0.047(根據 electricitymaps 估算 2024 年 11 月 20 日的數據)
- 碳排放 (功耗 x 時間 x 基於電網位置產生的碳量):0.0335 kg 等效 CO2
📄 許可證
本項目採用 MIT 許可證。
📚 引用
NERmemBERTa-3entities
@misc {NERmemberta2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { NERmemberta-3entities (Revision 989f2ee) },
year = 2024,
url = { https://huggingface.co/CATIE-AQ/NERmemberta-3entities },
doi = { 10.57967/hf/3640 },
publisher = { Hugging Face }
}
NERmemBERT
@misc {NERmembert2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { NERmembert-base-3entities },
year = 2024,
url = { https://huggingface.co/CATIE-AQ/NERmembert-base-3entities },
doi = { 10.57967/hf/1752 },
publisher = { Hugging Face }
}
CamemBERT
@inproceedings{martin2020camembert,
title={CamemBERT: a Tasty French Language Model},
author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
year={2020}}
CamemBERT 2.0
@misc{antoun2024camembert20smarterfrench,
title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
year={2024},
eprint={2411.08868},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.08868},
}
multiconer
@inproceedings{multiconer2-report,
title={{SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)}},
author={Fetahu, Besnik and Kar, Sudipta and Chen, Zhiyu and Rokhlenko, Oleg and Malmasi, Shervin},
booktitle={Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)},
year={2023},
publisher={Association for Computational Linguistics}}
@article{multiconer2-data,
title={{MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition}},
author={Fetahu, Besnik and Chen, Zhiyu and Kar, Sudipta and Rokhlenko, Oleg and Malmasi, Shervin},
year={2023}}
multinerd
@inproceedings{tedeschi-navigli-2022-multinerd,
title = "{M}ulti{NERD}: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)",
author = "Tedeschi, Simone and Navigli, Roberto",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-naacl.60",
doi = "10.18653/v1/2022.findings-naacl.60",
pages = "801--812"}
pii-masking-200k
@misc {ai4privacy_2023,
author = { {ai4Privacy} },
title = { pii-masking-200k (Revision 1d4c0a1) },
year = 2023,
url = { https://huggingface.co/datasets/ai4privacy/pii-masking-200k },
doi = { 10.57967/hf/1532 },
publisher = { Hugging Face }}
wikiann
@inproceedings{rahimi-etal-2019-massively,
title = "Massively Multilingual Transfer for {NER}",
author = "Rahimi, Afshin and Li, Yuan and Cohn, Trevor",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1015",
pages = "151--164"}
wikiner
@article{NOTHMAN2013151,
title = {Learning multilingual named entity recognition from Wikipedia},
journal = {Artificial Intelligence},
volume = {194},
pages = {151-175},
year = {2013},
note = {Artificial Intelligence, Wikipedia and Semi-Structured Resources},
issn = {0004-3702},
doi = {https://doi.org/10.1016/j.artint.2012.03.006},
url = {https://www.sciencedirect.com/science/article/pii/S0004370212000276},
author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
frenchNER_3entities
@misc {frenchNER2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { frenchNER_3entities },
year = 2024,
url = { https://huggingface.co/CATIE-AQ/frenchNER_3entities },
doi = { 10.57967/hf/1751 },
publisher = { Hugging Face }
}
Indonesian Roberta Base Posp Tagger
MIT
這是一個基於印尼語RoBERTa模型微調的詞性標註模型,在indonlu數據集上訓練,用於印尼語文本的詞性標註任務。
序列標註
Transformers 其他

I
w11wo
2.2M
7
Bert Base NER
MIT
基於BERT微調的命名實體識別模型,可識別四類實體:地點(LOC)、組織機構(ORG)、人名(PER)和雜項(MISC)
序列標註 英語
B
dslim
1.8M
592
Deid Roberta I2b2
MIT
該模型是基於RoBERTa微調的序列標註模型,用於識別和移除醫療記錄中的受保護健康信息(PHI/PII)。
序列標註
Transformers 支持多種語言

D
obi
1.1M
33
Ner English Fast
Flair自帶的英文快速4類命名實體識別模型,基於Flair嵌入和LSTM-CRF架構,在CoNLL-03數據集上達到92.92的F1分數。
序列標註
PyTorch 英語
N
flair
978.01k
24
French Camembert Postag Model
基於Camembert-base的法語詞性標註模型,使用free-french-treebank數據集訓練
序列標註
Transformers 法語

F
gilf
950.03k
9
Xlm Roberta Large Ner Spanish
基於XLM-Roberta-large架構微調的西班牙語命名實體識別模型,在CoNLL-2002數據集上表現優異。
序列標註
Transformers 西班牙語

X
MMG
767.35k
29
Nusabert Ner V1.3
MIT
基於NusaBert-v1.3在印尼語NER任務上微調的命名實體識別模型
序列標註
Transformers 其他

N
cahya
759.09k
3
Ner English Large
Flair框架內置的英文4類大型NER模型,基於文檔級XLM-R嵌入和FLERT技術,在CoNLL-03數據集上F1分數達94.36。
序列標註
PyTorch 英語
N
flair
749.04k
44
Punctuate All
MIT
基於xlm-roberta-base微調的多語言標點符號預測模型,支持12種歐洲語言的標點符號自動補全
序列標註
Transformers

P
kredor
728.70k
20
Xlm Roberta Ner Japanese
MIT
基於xlm-roberta-base微調的日語命名實體識別模型
序列標註
Transformers 支持多種語言

X
tsmatz
630.71k
25
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98