đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Scottish Gaelic
This model addresses the task of part - of - speech tagging across multiple languages, leveraging the power of the XLM - RoBERTa base architecture. It provides valuable insights into cross - lingual transfer, as detailed in the associated research paper.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-gd")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-gd")
đ License
The model is released under the Apache - 2.0 license.
đ Documentation
Model Information
Property |
Details |
Model Type |
xlm - roberta - base - ft - udpos28 - gd |
Tags |
part - of - speech, token - classification |
Datasets |
universal_dependencies |
Metrics |
accuracy |
Results
The model's performance on different languages in the Universal Dependencies v2.8 dataset is as follows:
Language |
Test Accuracy |
English |
75.0 |
Dutch |
77.8 |
German |
76.5 |
Italian |
70.8 |
French |
74.6 |
Spanish |
78.7 |
Russian |
79.2 |
Swedish |
78.9 |
Norwegian |
72.7 |
Danish |
78.0 |
Low Saxon |
51.0 |
Akkadian |
47.0 |
Armenian |
69.2 |
Welsh |
77.0 |
Old East Slavic |
70.1 |
Albanian |
76.1 |
Slovenian |
64.3 |
Guajajara |
42.6 |
Kurmanji |
73.6 |
Turkish |
71.7 |
Finnish |
74.4 |
Indonesian |
74.2 |
Ukrainian |
78.7 |
Polish |
81.4 |
Portuguese |
77.9 |
Kazakh |
73.3 |
Latin |
68.8 |
Old French |
48.7 |
Buryat |
58.4 |
Kaapor |
24.6 |
Korean |
58.9 |
Estonian |
76.8 |
Croatian |
74.0 |
Gothic |
29.4 |
Swiss German |
48.3 |
Assyrian |
20.1 |
North Sami |
44.3 |
Naija |
40.4 |
Latvian |
76.7 |
Chinese |
51.6 |
Tagalog |
68.3 |
Bambara |
30.3 |
Lithuanian |
77.2 |
Galician |
77.6 |
Vietnamese |
56.5 |
Greek |
79.1 |
Catalan |
74.5 |
Czech |
78.7 |
Erzya |
51.6 |
Bhojpuri |
49.4 |
Thai |
57.1 |
Marathi |
72.4 |
Basque |
65.9 |
Slovak |
80.3 |
Kiche |
45.0 |
Yoruba |
32.5 |
Warlpiri |
43.7 |
Tamil |
76.7 |
Maltese |
34.9 |
Ancient Greek |
59.3 |
Icelandic |
73.1 |
Mbya Guarani |
34.5 |
Urdu |
56.0 |
Romanian |
74.4 |
Persian |
77.3 |
Apurina |
48.4 |
Japanese |
38.6 |
Hungarian |
78.5 |
Hindi |
60.5 |
Classical Chinese |
31.6 |
Komi Permyak |
50.4 |
Faroese |
71.2 |
Sanskrit |
33.5 |
Livvi |
61.6 |
Arabic |
81.6 |
Wolof |
38.1 |
Bulgarian |
76.6 |
Akuntsu |
39.8 |
Makurap |
23.3 |
Kangri |
44.0 |
Breton |
60.9 |
Telugu |
74.5 |
Cantonese |
48.9 |
Old Church Slavonic |
47.7 |
Karelian |
65.4 |
Upper Sorbian |
70.9 |
South Levantine Arabic |
68.4 |
Komi Zyrian |
45.0 |
Irish |
76.6 |
Nayini |
44.9 |
Munduruku |
34.0 |
Manx |
52.0 |
Skolt Sami |
39.7 |
Afrikaans |
74.0 |
Old Turkish |
37.1 |
Tupinamba |
48.1 |
Belarusian |
79.7 |
Serbian |
72.7 |
Moksha |
49.3 |
Western Armenian |
68.1 |
Scottish Gaelic |
93.3 |
Khunsari |
44.6 |
Hebrew |
86.5 |
Uyghur |
67.5 |
Chukchi |
38.8 |