đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Latin
This model addresses the task of part - of - speech tagging across multiple languages. It is a valuable asset in cross - lingual transfer research, offering insights from over 100 languages.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-la")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-la")
đ Documentation
Property |
Details |
Model Type |
xlm - roberta - base - ft - udpos28 - la |
Tags |
part - of - speech, token - classification |
Datasets |
universal_dependencies |
Metrics |
accuracy |
Model Results
The model has been tested on various languages, and the accuracy results are as follows:
Language |
Test Accuracy |
English |
81.5 |
Dutch |
79.6 |
German |
78.2 |
Italian |
78.0 |
French |
78.1 |
Spanish |
79.8 |
Russian |
89.8 |
Swedish |
86.0 |
Norwegian |
81.5 |
Danish |
85.7 |
Low Saxon |
56.6 |
Akkadian |
44.7 |
Armenian |
86.4 |
Welsh |
65.1 |
Old East Slavic |
79.8 |
Albanian |
74.9 |
Slovenian |
77.4 |
Guajajara |
35.8 |
Kurmanji |
77.7 |
Turkish |
76.9 |
Finnish |
84.9 |
Indonesian |
82.0 |
Ukrainian |
87.8 |
Polish |
88.0 |
Portuguese |
82.3 |
Kazakh |
83.2 |
Latin |
92.9 |
Old French |
61.2 |
Buryat |
64.7 |
Kaapor |
34.2 |
Korean |
63.0 |
Estonian |
85.5 |
Croatian |
86.3 |
Gothic |
36.5 |
Swiss German |
47.8 |
Assyrian |
15.5 |
North Sami |
41.4 |
Naija |
41.9 |
Latvian |
89.1 |
Chinese |
44.3 |
Tagalog |
73.7 |
Bambara |
27.9 |
Lithuanian |
88.3 |
Galician |
81.7 |
Vietnamese |
68.0 |
Greek |
74.9 |
Catalan |
76.2 |
Czech |
86.3 |
Erzya |
50.8 |
Bhojpuri |
52.5 |
Thai |
61.6 |
Marathi |
88.3 |
Basque |
79.0 |
Slovak |
85.9 |
Kiche |
39.3 |
Yoruba |
29.9 |
Warlpiri |
40.9 |
Tamil |
85.7 |
Maltese |
32.8 |
Ancient Greek |
70.5 |
Icelandic |
81.6 |
Mbya Guarani |
33.1 |
Urdu |
61.3 |
Romanian |
83.1 |
Persian |
75.7 |
Apurina |
43.5 |
Japanese |
36.5 |
Hungarian |
74.5 |
Hindi |
67.0 |
Classical Chinese |
38.2 |
Komi Permyak |
52.2 |
Faroese |
75.6 |
Sanskrit |
43.5 |
Livvi |
66.1 |
Arabic |
81.3 |
Wolof |
39.1 |
Bulgarian |
87.7 |
Akuntsu |
35.5 |
Makurap |
28.8 |
Kangri |
49.8 |
Breton |
59.8 |
Telugu |
84.3 |
Cantonese |
50.3 |
Old Church Slavonic |
55.7 |
Karelian |
73.0 |
Upper Sorbian |
76.0 |
South Levantine Arabic |
68.8 |
Komi Zyrian |
46.3 |
Irish |
64.1 |
Nayini |
44.9 |
Munduruku |
24.1 |
Manx |
39.3 |
Skolt Sami |
43.5 |
Afrikaans |
74.8 |
Old Turkish |
37.1 |
Tupinamba |
45.2 |
Belarusian |
89.1 |
Serbian |
87.2 |
Moksha |
47.3 |
Western Armenian |
81.6 |
Scottish Gaelic |
55.3 |
Khunsari |
43.2 |
Hebrew |
89.6 |
Uyghur |
76.8 |
Chukchi |
36.3 |
đ License
The model is licensed under the Apache - 2.0 license.