đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Lithuanian
This model addresses the part - of - speech tagging task. It is based on the XLM - RoBERTa base architecture and trained on Universal Dependencies v2.8 data for Lithuanian. It is part of a research effort to optimize cross - lingual transfer in POS tagging across over 100 languages.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-lt")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-lt")
đ License
The model is released under the Apache - 2.0 license.
đ Documentation
Model Information
Property |
Details |
Model Type |
XLM - RoBERTa base fine - tuned for Universal Dependencies v2.8 POS tagging (Lithuanian) |
Training Data |
Universal Dependencies v2.8 |
Results
The model has been evaluated on multiple languages with the accuracy metric. Here are the results:
Language |
Test accuracy |
English |
80.7 |
Dutch |
80.6 |
German |
76.0 |
Italian |
77.8 |
French |
75.5 |
Spanish |
79.6 |
Russian |
88.9 |
Swedish |
81.6 |
Norwegian |
76.3 |
Danish |
78.9 |
Low Saxon |
52.0 |
Akkadian |
31.6 |
Armenian |
84.1 |
Welsh |
63.8 |
Old East Slavic |
75.6 |
Albanian |
76.8 |
Slovenian |
81.4 |
Guajajara |
26.7 |
Kurmanji |
77.1 |
Turkish |
74.9 |
Finnish |
83.2 |
Indonesian |
78.0 |
Ukrainian |
88.1 |
Polish |
86.3 |
Portuguese |
81.6 |
Kazakh |
83.1 |
Latin |
78.7 |
Old French |
56.1 |
Buryat |
64.3 |
Kaapor |
22.5 |
Korean |
64.6 |
Estonian |
81.5 |
Croatian |
86.6 |
Gothic |
22.6 |
Swiss German |
48.1 |
Assyrian |
14.6 |
North Sami |
39.8 |
Naija |
41.4 |
Latvian |
89.0 |
Chinese |
34.4 |
Tagalog |
73.0 |
Bambara |
26.4 |
Lithuanian |
96.1 |
Galician |
81.1 |
Vietnamese |
65.3 |
Greek |
81.8 |
Catalan |
76.2 |
Czech |
86.5 |
Erzya |
48.7 |
Bhojpuri |
50.9 |
Thai |
54.5 |
Marathi |
82.8 |
Basque |
75.6 |
Slovak |
88.5 |
Kiche |
33.5 |
Yoruba |
24.6 |
Warlpiri |
44.1 |
Tamil |
79.1 |
Maltese |
25.5 |
Ancient Greek |
65.8 |
Icelandic |
80.7 |
Mbya Guarani |
32.2 |
Urdu |
59.1 |
Romanian |
78.6 |
Persian |
72.8 |
Apurina |
42.0 |
Japanese |
22.9 |
Hungarian |
76.9 |
Hindi |
62.2 |
Classical Chinese |
15.8 |
Komi Permyak |
48.3 |
Faroese |
77.3 |
Sanskrit |
41.0 |
Livvi |
67.2 |
Arabic |
73.9 |
Wolof |
28.0 |
Bulgarian |
85.9 |
Akuntsu |
26.0 |
Makurap |
17.8 |
Kangri |
50.6 |
Breton |
60.3 |
Telugu |
85.0 |
Cantonese |
39.1 |
Old Church Slavonic |
51.6 |
Karelian |
71.3 |
Upper Sorbian |
75.7 |
South Levantine Arabic |
67.0 |
Komi Zyrian |
43.0 |
Irish |
60.1 |
Nayini |
46.2 |
Munduruku |
18.8 |
Manx |
33.3 |
Skolt Sami |
37.3 |
Afrikaans |
76.4 |
Old Turkish |
37.1 |
Tupinamba |
34.1 |
Belarusian |
89.1 |
Serbian |
87.7 |
Moksha |
46.3 |
Western Armenian |
75.4 |
Scottish Gaelic |
56.2 |
Khunsari |
39.2 |
Hebrew |
83.3 |
Uyghur |
76.6 |
Chukchi |
35.4 |