đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Gothic
This model addresses the task of part - of - speech tagging across multiple languages. It's a significant contribution in the realm of cross - lingual transfer, as presented in our related research paper, offering valuable insights into leveraging cross - lingual data for more effective language processing.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
⨠Features
- Multilingual Support: Capable of performing part - of - speech tagging on over 100 languages.
- Cross - lingual Transfer: Demonstrates effective cross - lingual transfer learning, as detailed in the associated research paper.
đĻ Installation
There is no specific installation step provided in the original document.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-got")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-got")
Advanced Usage
There is no advanced usage code example provided in the original document.
đ Documentation
Model Information
Property |
Details |
Model Type |
XLM - RoBERTa base fine - tuned for Universal Dependencies v2.8 POS tagging (Gothic) |
Training Data |
Universal Dependencies v2.8 |
Results
The following are the accuracy metrics of the model on different languages:
Language |
Test Accuracy |
English |
47.9 |
Dutch |
50.2 |
German |
38.9 |
Italian |
46.8 |
French |
50.2 |
Spanish |
51.3 |
Russian |
52.4 |
Swedish |
51.5 |
Norwegian |
49.1 |
Danish |
50.8 |
Low Saxon |
32.8 |
Akkadian |
43.8 |
Armenian |
50.4 |
Welsh |
41.1 |
Old East Slavic |
53.9 |
Albanian |
49.0 |
Slovenian |
45.3 |
Guajajara |
23.8 |
Kurmanji |
49.3 |
Turkish |
46.6 |
Finnish |
51.2 |
Indonesian |
55.4 |
Ukrainian |
50.0 |
Polish |
52.4 |
Portuguese |
50.4 |
Kazakh |
46.5 |
Latin |
49.1 |
Old French |
47.6 |
Buryat |
37.4 |
Kaapor |
33.8 |
Korean |
41.5 |
Estonian |
49.5 |
Croatian |
57.2 |
Gothic |
93.6 |
Swiss German |
25.1 |
Assyrian |
4.0 |
North Sami |
27.9 |
Naija |
29.2 |
Latvian |
51.5 |
Chinese |
16.4 |
Tagalog |
42.0 |
Bambara |
13.1 |
Lithuanian |
50.5 |
Galician |
49.2 |
Vietnamese |
47.1 |
Greek |
42.0 |
Catalan |
50.1 |
Czech |
54.3 |
Erzya |
22.1 |
Bhojpuri |
38.8 |
Thai |
34.7 |
Marathi |
35.0 |
Basque |
45.9 |
Slovak |
55.3 |
Kiche |
23.3 |
Yoruba |
15.0 |
Warlpiri |
23.5 |
Tamil |
41.1 |
Maltese |
21.4 |
Ancient Greek |
50.9 |
Icelandic |
50.3 |
Mbya Guarani |
14.8 |
Urdu |
41.4 |
Romanian |
50.1 |
Persian |
53.1 |
Apurina |
20.8 |
Japanese |
16.3 |
Hungarian |
42.3 |
Hindi |
45.2 |
Classical Chinese |
19.6 |
Komi Permyak |
23.4 |
Faroese |
48.9 |
Sanskrit |
32.4 |
Livvi |
38.5 |
Arabic |
49.6 |
Wolof |
28.4 |
Bulgarian |
55.6 |
Akuntsu |
25.2 |
Makurap |
18.5 |
Kangri |
34.2 |
Breton |
36.7 |
Telugu |
38.8 |
Cantonese |
17.1 |
Old Church Slavonic |
50.2 |
Karelian |
41.7 |
Upper Sorbian |
42.7 |
South Levantine Arabic |
38.9 |
Komi Zyrian |
21.1 |
Irish |
37.2 |
Nayini |
33.3 |
Munduruku |
26.6 |
Manx |
17.6 |
Skolt Sami |
19.9 |
Afrikaans |
45.9 |
Old Turkish |
2.7 |
Tupinamba |
23.4 |
Belarusian |
53.0 |
Serbian |
57.4 |
Moksha |
24.5 |
Western Armenian |
47.2 |
Scottish Gaelic |
36.7 |
Khunsari |
28.4 |
Hebrew |
44.8 |
Uyghur |
48.6 |
Chukchi |
21.0 |
đ License
This model is released under the Apache 2.0 license.