đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Naija
This model is designed for part - of - speech tagging in Naija and other languages, leveraging cross - lingual transfer technology.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-pcm")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-pcm")
đ Documentation
Property |
Details |
Language |
pcm |
License |
apache - 2.0 |
Library Name |
transformers |
Tags |
part - of - speech, token - classification |
Datasets |
universal_dependencies |
Metrics |
accuracy |
Model Results
The model named xlm-roberta-base-ft-udpos28-pcm
has the following results in the token - classification task (Part - of - Speech Tagging) on the Universal Dependencies v2.8 dataset:
Language |
Test Accuracy |
English |
77.2 |
Dutch |
75.2 |
German |
73.2 |
Italian |
68.9 |
French |
74.0 |
Spanish |
75.1 |
Russian |
70.3 |
Swedish |
78.9 |
Norwegian |
74.3 |
Danish |
73.4 |
Low Saxon |
37.9 |
Akkadian |
28.0 |
Armenian |
65.4 |
Welsh |
59.7 |
Old East Slavic |
61.0 |
Albanian |
66.1 |
Slovenian |
67.6 |
Guajajara |
16.1 |
Kurmanji |
54.8 |
Turkish |
58.2 |
Finnish |
67.4 |
Indonesian |
68.5 |
Ukrainian |
68.1 |
Polish |
68.8 |
Portuguese |
72.9 |
Kazakh |
60.1 |
Latin |
64.3 |
Old French |
51.1 |
Buryat |
38.9 |
Kaapor |
16.7 |
Korean |
52.4 |
Estonian |
68.3 |
Croatian |
73.0 |
Gothic |
21.4 |
Swiss German |
33.4 |
Assyrian |
0.0 |
North Sami |
24.3 |
Naija |
97.9 |
Latvian |
66.3 |
Chinese |
34.3 |
Tagalog |
49.9 |
Bambara |
16.7 |
Lithuanian |
65.7 |
Galician |
72.4 |
Vietnamese |
54.3 |
Greek |
73.3 |
Catalan |
73.6 |
Czech |
69.5 |
Erzya |
22.1 |
Bhojpuri |
36.6 |
Thai |
65.4 |
Marathi |
50.3 |
Basque |
58.5 |
Slovak |
70.4 |
Kiche |
8.0 |
Yoruba |
6.1 |
Warlpiri |
15.4 |
Tamil |
60.1 |
Maltese |
12.2 |
Ancient Greek |
45.8 |
Icelandic |
72.5 |
Mbya Guarani |
11.4 |
Urdu |
59.1 |
Romanian |
64.8 |
Persian |
67.2 |
Apurina |
15.5 |
Japanese |
26.1 |
Hungarian |
68.6 |
Hindi |
65.0 |
Classical Chinese |
30.4 |
Komi Permyak |
21.2 |
Faroese |
61.6 |
Sanskrit |
25.6 |
Livvi |
39.7 |
Arabic |
63.5 |
Wolof |
15.9 |
Bulgarian |
74.6 |
Akuntsu |
26.5 |
Makurap |
11.6 |
Kangri |
27.8 |
Breton |
46.6 |
Telugu |
59.4 |
Cantonese |
30.7 |
Old Church Slavonic |
36.7 |
Karelian |
45.9 |
Upper Sorbian |
49.3 |
South Levantine Arabic |
42.5 |
Komi Zyrian |
18.4 |
Irish |
48.3 |
Nayini |
24.4 |
Munduruku |
16.1 |
Manx |
14.7 |
Skolt Sami |
5.4 |
Afrikaans |
76.5 |
Old Turkish |
0.0 |
Tupinamba |
16.3 |
Belarusian |
70.7 |
Serbian |
74.8 |
Moksha |
24.1 |
Western Armenian |
59.8 |
Scottish Gaelic |
45.4 |
Khunsari |
21.6 |
Hebrew |
65.6 |
Uyghur |
55.0 |
Chukchi |
12.6 |
đ License
The model is released under the apache - 2.0 license.