đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Icelandic
This model is designed for part - of - speech tagging in Icelandic, leveraging the XLM - RoBERTa base architecture on Universal Dependencies v2.8 dataset. It provides a solution for cross - lingual transfer in POS tagging across over 100 languages.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
⨠Features
- Multilingual Support: Capable of part - of - speech tagging in over 100 languages.
- High Accuracy: Demonstrates high accuracy in various languages as shown in the performance metrics.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-is")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-is")
đ Documentation
Model Information
Property |
Details |
Model Type |
xlm - roberta - base - ft - udpos28 - is |
Training Data |
Universal Dependencies v2.8 |
Library Name |
transformers |
Tags |
part - of - speech, token - classification |
Performance Metrics
The model has been evaluated on multiple languages with the following accuracy results:
Language |
Test Accuracy |
English |
88.4 |
Dutch |
86.9 |
German |
82.7 |
Italian |
84.6 |
French |
83.6 |
Spanish |
83.6 |
Russian |
87.6 |
Swedish |
89.9 |
Norwegian |
86.4 |
Danish |
89.6 |
Low Saxon |
57.6 |
Akkadian |
30.5 |
Armenian |
86.6 |
Welsh |
66.9 |
Old East Slavic |
76.3 |
Albanian |
80.8 |
Slovenian |
76.8 |
Guajajara |
31.8 |
Kurmanji |
78.6 |
Turkish |
77.3 |
Finnish |
84.8 |
Indonesian |
84.4 |
Ukrainian |
85.9 |
Polish |
84.2 |
Portuguese |
86.6 |
Kazakh |
81.8 |
Latin |
75.8 |
Old French |
58.6 |
Buryat |
63.1 |
Kaapor |
18.3 |
Korean |
64.3 |
Estonian |
86.7 |
Croatian |
86.0 |
Gothic |
26.6 |
Swiss German |
45.6 |
Assyrian |
15.5 |
North Sami |
43.9 |
Naija |
46.6 |
Latvian |
85.3 |
Chinese |
60.4 |
Tagalog |
80.0 |
Bambara |
32.5 |
Lithuanian |
85.9 |
Galician |
80.7 |
Vietnamese |
64.1 |
Greek |
80.5 |
Catalan |
82.7 |
Czech |
84.6 |
Erzya |
52.8 |
Bhojpuri |
59.0 |
Thai |
68.2 |
Marathi |
87.1 |
Basque |
79.5 |
Slovak |
86.0 |
Kiche |
42.2 |
Yoruba |
34.3 |
Warlpiri |
43.7 |
Tamil |
83.9 |
Maltese |
27.5 |
Ancient Greek |
64.0 |
Icelandic |
95.6 |
Mbya Guarani |
31.9 |
Urdu |
72.7 |
Romanian |
82.0 |
Persian |
78.3 |
Apurina |
47.9 |
Japanese |
44.0 |
Hungarian |
77.2 |
Hindi |
77.4 |
Classical Chinese |
46.0 |
Komi Permyak |
52.7 |
Faroese |
83.9 |
Sanskrit |
37.4 |
Livvi |
66.8 |
Arabic |
79.2 |
Wolof |
39.9 |
Bulgarian |
87.7 |
Akuntsu |
37.0 |
Makurap |
24.7 |
Kangri |
50.2 |
Breton |
61.8 |
Telugu |
84.5 |
Cantonese |
60.6 |
Old Church Slavonic |
53.9 |
Karelian |
74.0 |
Upper Sorbian |
75.5 |
South Levantine Arabic |
70.8 |
Komi Zyrian |
47.1 |
Irish |
66.8 |
Nayini |
43.6 |
Munduruku |
28.3 |
Manx |
48.6 |
Skolt Sami |
39.6 |
Afrikaans |
87.4 |
Old Turkish |
38.9 |
Tupinamba |
37.6 |
Belarusian |
86.8 |
Serbian |
87.2 |
Moksha |
49.8 |
Western Armenian |
79.9 |
Scottish Gaelic |
56.8 |
Khunsari |
52.7 |
Hebrew |
85.4 |
Uyghur |
76.9 |
Chukchi |
37.7 |
đ License
This model is released under the Apache 2.0 license.