đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: English
This model is designed for part - of - speech tagging and is trained on the Universal Dependencies v2.8 dataset. It offers high accuracy across multiple languages, providing a powerful tool for cross - lingual transfer in POS tagging tasks.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-en")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-en")
đ License
The model is licensed under the Apache - 2.0 license.
đ Documentation
Model Information
Property |
Details |
Library Name |
transformers |
Tags |
part - of - speech, token - classification |
Datasets |
universal_dependencies |
Metrics |
accuracy |
Model Results
The model xlm-roberta-base-ft-udpos28-en
has the following performance results on different languages in the Part - of - Speech Tagging task using the Universal Dependencies v2.8 dataset:
Language |
Test Accuracy |
English |
96.0 |
Dutch |
90.4 |
German |
88.6 |
Italian |
87.8 |
French |
87.4 |
Spanish |
90.3 |
Russian |
91.0 |
Swedish |
94.0 |
Norwegian |
89.6 |
Danish |
91.6 |
Low Saxon |
57.4 |
Akkadian |
26.4 |
Armenian |
88.5 |
Welsh |
70.6 |
Old East Slavic |
76.5 |
Albanian |
82.3 |
Slovenian |
79.0 |
Guajajara |
17.2 |
Kurmanji |
76.9 |
Turkish |
79.1 |
Finnish |
87.2 |
Indonesian |
86.9 |
Ukrainian |
87.6 |
Polish |
87.2 |
Portuguese |
90.0 |
Kazakh |
82.5 |
Latin |
79.6 |
Old French |
53.4 |
Buryat |
58.8 |
Kaapor |
9.2 |
Korean |
64.0 |
Estonian |
88.4 |
Croatian |
87.9 |
Gothic |
20.5 |
Swiss German |
47.6 |
Assyrian |
14.6 |
North Sami |
32.0 |
Naija |
47.5 |
Latvian |
87.5 |
Chinese |
47.5 |
Tagalog |
73.5 |
Bambara |
27.7 |
Lithuanian |
87.3 |
Galician |
87.1 |
Vietnamese |
66.4 |
Greek |
87.6 |
Catalan |
89.7 |
Czech |
88.1 |
Erzya |
47.6 |
Bhojpuri |
50.7 |
Thai |
59.5 |
Marathi |
82.2 |
Basque |
76.0 |
Slovak |
88.5 |
Kiche |
25.4 |
Yoruba |
18.5 |
Warlpiri |
29.1 |
Tamil |
83.4 |
Maltese |
21.1 |
Ancient Greek |
66.8 |
Icelandic |
84.8 |
Mbya Guarani |
24.1 |
Urdu |
67.0 |
Romanian |
85.7 |
Persian |
76.7 |
Apurina |
28.6 |
Japanese |
34.1 |
Hungarian |
86.0 |
Hindi |
74.1 |
Classical Chinese |
29.4 |
Komi Permyak |
47.4 |
Faroese |
77.0 |
Sanskrit |
25.6 |
Livvi |
63.2 |
Arabic |
80.7 |
Wolof |
26.1 |
Bulgarian |
90.8 |
Akuntsu |
18.3 |
Makurap |
5.5 |
Kangri |
43.0 |
Breton |
64.1 |
Telugu |
84.7 |
Cantonese |
54.0 |
Old Church Slavonic |
53.7 |
Karelian |
69.7 |
Upper Sorbian |
75.6 |
South Levantine Arabic |
66.3 |
Komi Zyrian |
39.9 |
Irish |
67.0 |
Nayini |
44.9 |
Munduruku |
12.3 |
Manx |
25.4 |
Skolt Sami |
29.9 |
Afrikaans |
89.3 |
Old Turkish |
37.1 |
Tupinamba |
23.1 |
Belarusian |
89.1 |
Serbian |
88.4 |
Moksha |
44.1 |
Western Armenian |
80.1 |
Scottish Gaelic |
59.0 |
Khunsari |
43.2 |
Hebrew |
90.6 |
Uyghur |
75.8 |
Chukchi |
32.6 |