đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Catalan
This model addresses the task of part - of - speech tagging in Catalan within the Universal Dependencies v2.8 framework. It is a valuable asset for natural language processing tasks, offering insights into cross - lingual transfer as presented in our related research.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-ca")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-ca")
đ License
The model is released under the Apache 2.0 license.
đ Model Information
Property |
Details |
Model Type |
Transformer - based model for part - of - speech tagging |
Training Data |
Universal Dependencies v2.8 |
Metrics |
Accuracy |
đ Model Results
The following table shows the accuracy of the model on various languages:
Language |
Test Accuracy |
English |
86.3 |
Dutch |
87.2 |
German |
79.2 |
Italian |
90.2 |
French |
90.7 |
Spanish |
94.8 |
Russian |
89.1 |
Swedish |
89.5 |
Norwegian |
84.7 |
Danish |
89.3 |
Low Saxon |
53.3 |
Akkadian |
41.0 |
Armenian |
84.7 |
Welsh |
66.0 |
Old East Slavic |
77.4 |
Albanian |
79.2 |
Slovenian |
79.1 |
Guajajara |
32.9 |
Kurmanji |
78.2 |
Turkish |
76.2 |
Finnish |
84.7 |
Indonesian |
84.5 |
Ukrainian |
87.5 |
Polish |
87.4 |
Portuguese |
91.4 |
Kazakh |
80.6 |
Latin |
79.3 |
Old French |
66.5 |
Buryat |
62.8 |
Kaapor |
27.5 |
Korean |
61.6 |
Estonian |
87.2 |
Croatian |
88.8 |
Gothic |
29.1 |
Swiss German |
42.1 |
Assyrian |
17.2 |
North Sami |
41.0 |
Naija |
40.3 |
Latvian |
85.0 |
Chinese |
32.3 |
Tagalog |
72.5 |
Bambara |
29.8 |
Lithuanian |
84.1 |
Galician |
88.8 |
Vietnamese |
65.2 |
Greek |
85.9 |
Catalan |
98.7 |
Czech |
89.3 |
Erzya |
50.9 |
Bhojpuri |
49.7 |
Thai |
43.4 |
Marathi |
82.2 |
Basque |
74.9 |
Slovak |
89.6 |
Kiche |
39.2 |
Yoruba |
28.8 |
Warlpiri |
36.4 |
Tamil |
82.2 |
Maltese |
36.2 |
Ancient Greek |
62.0 |
Icelandic |
83.2 |
Mbya Guarani |
32.6 |
Urdu |
65.2 |
Romanian |
84.8 |
Persian |
76.7 |
Apurina |
37.3 |
Japanese |
19.9 |
Hungarian |
87.2 |
Hindi |
68.8 |
Classical Chinese |
19.2 |
Komi Permyak |
52.6 |
Faroese |
76.4 |
Sanskrit |
38.4 |
Livvi |
64.0 |
Arabic |
79.2 |
Wolof |
38.2 |
Bulgarian |
89.9 |
Akuntsu |
43.4 |
Makurap |
23.3 |
Kangri |
44.9 |
Breton |
63.5 |
Telugu |
85.0 |
Cantonese |
40.5 |
Old Church Slavonic |
57.8 |
Karelian |
73.3 |
Upper Sorbian |
75.8 |
South Levantine Arabic |
64.0 |
Komi Zyrian |
44.2 |
Irish |
67.2 |
Nayini |
50.0 |
Munduruku |
28.8 |
Manx |
35.3 |
Skolt Sami |
41.3 |
Afrikaans |
86.0 |
Old Turkish |
45.7 |
Tupinamba |
36.6 |
Belarusian |
86.0 |
Serbian |
90.4 |
Moksha |
47.7 |
Western Armenian |
78.7 |
Scottish Gaelic |
54.8 |
Khunsari |
47.3 |
Hebrew |
91.7 |
Uyghur |
75.4 |
Chukchi |
34.9 |