đ XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Welsh
This model addresses part - of - speech tagging tasks. It is based on the XLM - RoBERTa base model, fine - tuned on Universal Dependencies v2.8 data for Welsh. It offers high - quality POS tagging across multiple languages as demonstrated in the paper.
đ Quick Start
This model is part of our paper called:
- Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages
Check the Space for more details.
⨠Features
- Multilingual Support: Supports part - of - speech tagging for over 100 languages.
- High Accuracy: Achieves high accuracy scores on various language datasets, as shown in the model index.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-cy")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-cy")
đ Documentation
Model Information
Property |
Details |
Model Type |
Part - of - Speech Tagging, Token Classification |
Training Data |
Universal Dependencies |
Metrics |
Accuracy |
Model Index
The model named xlm-roberta-base-ft-udpos28-cy
has the following results:
Language |
Test Accuracy |
English |
78.9 |
Dutch |
81.3 |
German |
78.3 |
Italian |
74.9 |
French |
77.1 |
Spanish |
81.0 |
Russian |
82.0 |
Swedish |
80.6 |
Norwegian |
76.4 |
Danish |
78.7 |
Low Saxon |
52.7 |
Akkadian |
42.4 |
Armenian |
73.7 |
Welsh |
94.9 |
Old East Slavic |
71.6 |
Albanian |
76.8 |
Slovenian |
67.6 |
Guajajara |
33.1 |
Kurmanji |
77.1 |
Turkish |
72.0 |
Finnish |
77.1 |
Indonesian |
75.0 |
Ukrainian |
80.9 |
Polish |
82.7 |
Portuguese |
80.1 |
Kazakh |
75.5 |
Latin |
73.7 |
Old French |
54.0 |
Buryat |
60.2 |
Kaapor |
21.2 |
Korean |
56.8 |
Estonian |
79.4 |
Croatian |
79.6 |
Gothic |
29.3 |
Swiss German |
48.3 |
Assyrian |
14.6 |
North Sami |
45.4 |
Naija |
35.7 |
Latvian |
78.4 |
Chinese |
39.9 |
Tagalog |
71.9 |
Bambara |
33.2 |
Lithuanian |
77.7 |
Galician |
79.0 |
Vietnamese |
55.2 |
Greek |
79.5 |
Catalan |
78.1 |
Czech |
80.7 |
Erzya |
48.3 |
Bhojpuri |
55.0 |
Thai |
53.2 |
Marathi |
78.5 |
Basque |
69.5 |
Slovak |
82.6 |
Kiche |
41.2 |
Yoruba |
33.9 |
Warlpiri |
36.8 |
Tamil |
75.5 |
Maltese |
36.4 |
Ancient Greek |
55.4 |
Icelandic |
73.8 |
Mbya Guarani |
33.4 |
Urdu |
64.6 |
Romanian |
76.5 |
Persian |
78.7 |
Apurina |
48.4 |
Japanese |
28.6 |
Hungarian |
79.9 |
Hindi |
70.9 |
Classical Chinese |
20.5 |
Komi Permyak |
53.0 |
Faroese |
73.1 |
Sanskrit |
38.0 |
Livvi |
65.3 |
Arabic |
85.9 |
Wolof |
43.4 |
Bulgarian |
82.8 |
Akuntsu |
36.0 |
Makurap |
24.7 |
Kangri |
47.2 |
Breton |
61.8 |
Telugu |
74.6 |
Cantonese |
40.7 |
Old Church Slavonic |
50.3 |
Karelian |
70.6 |
Upper Sorbian |
74.1 |
South Levantine Arabic |
70.1 |
Komi Zyrian |
44.7 |
Irish |
69.5 |
Nayini |
53.8 |
Munduruku |
28.1 |
Manx |
47.4 |
Skolt Sami |
42.0 |
Afrikaans |
74.7 |
Old Turkish |
38.0 |
Tupinamba |
37.4 |
Belarusian |
84.5 |
Serbian |
80.8 |
Moksha |
47.7 |
Western Armenian |
68.7 |
Scottish Gaelic |
67.4 |
Khunsari |
50.0 |
Hebrew |
86.5 |
Uyghur |
68.9 |
Chukchi |
36.8 |
đ License
The model is licensed under the Apache - 2.0 license.