đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Old East Slavic
This model is designed for part - of - speech tagging, leveraging the XLM - RoBERTa base architecture on the Universal Dependencies v2.8 dataset. It provides valuable insights into cross - lingual transfer in POS tagging across over 100 languages.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
⨠Features
- Multilingual Support: Capable of performing part - of - speech tagging on over 100 languages.
- High - performance Metrics: Demonstrates varying levels of accuracy across different languages, as shown in the results section.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-orv")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-orv")
đ Documentation
Model Information
Property |
Details |
Model Type |
xlm - roberta - base - ft - udpos28 - orv |
Training Data |
Universal Dependencies v2.8 |
Library Name |
transformers |
Tags |
part - of - speech, token - classification |
Results
The model has been evaluated on multiple languages with the accuracy metric. Here are some of the results:
Language |
Test Accuracy |
English |
79.4 |
Dutch |
77.8 |
German |
79.3 |
Italian |
77.5 |
French |
75.2 |
Spanish |
77.2 |
Russian |
87.9 |
Swedish |
83.0 |
Norwegian |
78.6 |
Danish |
82.9 |
Low Saxon |
58.9 |
Akkadian |
41.8 |
Armenian |
82.7 |
Welsh |
64.3 |
Old East Slavic |
91.0 |
Albanian |
73.4 |
Slovenian |
73.8 |
Guajajara |
41.7 |
Kurmanji |
76.7 |
Turkish |
73.5 |
Finnish |
83.0 |
Indonesian |
78.9 |
Ukrainian |
86.7 |
Polish |
85.5 |
Portuguese |
79.5 |
Kazakh |
79.7 |
Latin |
80.9 |
Old French |
60.5 |
Buryat |
59.8 |
Kaapor |
27.1 |
Korean |
61.0 |
Estonian |
83.9 |
Croatian |
84.7 |
Gothic |
33.1 |
Swiss German |
53.5 |
Assyrian |
15.7 |
North Sami |
39.9 |
Naija |
41.9 |
Latvian |
85.7 |
Chinese |
42.7 |
Tagalog |
73.5 |
Bambara |
29.5 |
Lithuanian |
86.1 |
Galician |
77.7 |
Vietnamese |
64.8 |
Greek |
73.8 |
Catalan |
74.2 |
Czech |
85.0 |
Erzya |
46.1 |
Bhojpuri |
56.8 |
Thai |
60.6 |
Marathi |
84.0 |
Basque |
77.2 |
Slovak |
84.3 |
Kiche |
35.3 |
Yoruba |
29.9 |
Warlpiri |
33.6 |
Tamil |
84.3 |
Maltese |
32.0 |
Ancient Greek |
65.7 |
Icelandic |
81.6 |
Mbya Guarani |
33.2 |
Urdu |
66.2 |
Romanian |
80.9 |
Persian |
74.6 |
Apurina |
44.6 |
Japanese |
35.7 |
Hungarian |
73.3 |
Hindi |
75.3 |
Classical Chinese |
41.5 |
Komi Permyak |
49.0 |
Faroese |
78.3 |
Sanskrit |
43.3 |
Livvi |
70.2 |
Arabic |
79.8 |
Wolof |
39.8 |
Bulgarian |
85.8 |
Akuntsu |
36.5 |
Makurap |
14.4 |
Kangri |
52.0 |
Breton |
58.1 |
Telugu |
79.9 |
Cantonese |
50.8 |
Old Church Slavonic |
78.2 |
Karelian |
73.5 |
Upper Sorbian |
76.0 |
South Levantine Arabic |
70.0 |
Komi Zyrian |
43.1 |
Irish |
61.1 |
Nayini |
53.8 |
Munduruku |
26.4 |
Manx |
44.6 |
Skolt Sami |
45.2 |
Afrikaans |
76.9 |
Old Turkish |
2.7 |
Tupinamba |
39.0 |
Belarusian |
89.5 |
Serbian |
85.1 |
Moksha |
42.8 |
Western Armenian |
77.0 |
Scottish Gaelic |
51.6 |
Khunsari |
54.1 |
Hebrew |
85.4 |
Uyghur |
74.4 |
Chukchi |
34.5 |
đ License
The model is released under the Apache - 2.0 license.