xlm-roberta-base-ft-udpos28-cy Open-source Multilingual Part-of-Speech Tagging Model - Free Deployment and Optimization for Welsh Tagging

Xlm Roberta Base Ft Udpos28 Cy

Developed by wietsedv

A multilingual POS tagging model based on XLM-RoBERTa, fine-tuned on Universal Dependencies v2.8 with special optimization for Welsh

Sequence Labeling

Transformers

OtherOpen Source License:Apache-2.0 #Multilingual POS Tagging #High Accuracy Welsh #Cross-lingual Transfer

Downloads 15

Release Time : 3/2/2022

Model Overview

This model is a multilingual POS tagging model based on the XLM-RoBERTa architecture, fine-tuned on Universal Dependencies v2.8. It provides POS tagging for multiple languages, with special optimization for Welsh.

Model Features

Multilingual Support

Supports POS tagging for multiple languages including Welsh

High Accuracy

Achieves 94.9% test accuracy on Welsh

Based on Universal Dependencies

Fine-tuned on Universal Dependencies v2.8, following standard annotation norms

Model Capabilities

POS Tagging

Multilingual Text Processing

Natural Language Processing

Use Cases

Natural Language Processing

Welsh Text Analysis

Perform POS tagging on Welsh text

94.9% accuracy

Multilingual Text Processing

Handle POS tagging tasks for multiple languages

Accuracy ranges from 14.6% to 94.9% across different languages

🚀 XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Welsh

This model addresses part - of - speech tagging tasks. It is based on the XLM - RoBERTa base model, fine - tuned on Universal Dependencies v2.8 data for Welsh. It offers high - quality POS tagging across multiple languages as demonstrated in the paper.

🚀 Quick Start

This model is part of our paper called:

Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages

Check the Space for more details.

✨ Features

Multilingual Support: Supports part - of - speech tagging for over 100 languages.
High Accuracy: Achieves high accuracy scores on various language datasets, as shown in the model index.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-cy")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-cy")

📚 Documentation

Model Information

Property	Details
Model Type	Part - of - Speech Tagging, Token Classification
Training Data	Universal Dependencies
Metrics	Accuracy

Model Index

The model named xlm-roberta-base-ft-udpos28-cy has the following results:

Language	Test Accuracy
English	78.9
Dutch	81.3
German	78.3
Italian	74.9
French	77.1
Spanish	81.0
Russian	82.0
Swedish	80.6
Norwegian	76.4
Danish	78.7
Low Saxon	52.7
Akkadian	42.4
Armenian	73.7
Welsh	94.9
Old East Slavic	71.6
Albanian	76.8
Slovenian	67.6
Guajajara	33.1
Kurmanji	77.1
Turkish	72.0
Finnish	77.1
Indonesian	75.0
Ukrainian	80.9
Polish	82.7
Portuguese	80.1
Kazakh	75.5
Latin	73.7
Old French	54.0
Buryat	60.2
Kaapor	21.2
Korean	56.8
Estonian	79.4
Croatian	79.6
Gothic	29.3
Swiss German	48.3
Assyrian	14.6
North Sami	45.4
Naija	35.7
Latvian	78.4
Chinese	39.9
Tagalog	71.9
Bambara	33.2
Lithuanian	77.7
Galician	79.0
Vietnamese	55.2
Greek	79.5
Catalan	78.1
Czech	80.7
Erzya	48.3
Bhojpuri	55.0
Thai	53.2
Marathi	78.5
Basque	69.5
Slovak	82.6
Kiche	41.2
Yoruba	33.9
Warlpiri	36.8
Tamil	75.5
Maltese	36.4
Ancient Greek	55.4
Icelandic	73.8
Mbya Guarani	33.4
Urdu	64.6
Romanian	76.5
Persian	78.7
Apurina	48.4
Japanese	28.6
Hungarian	79.9
Hindi	70.9
Classical Chinese	20.5
Komi Permyak	53.0
Faroese	73.1
Sanskrit	38.0
Livvi	65.3
Arabic	85.9
Wolof	43.4
Bulgarian	82.8
Akuntsu	36.0
Makurap	24.7
Kangri	47.2
Breton	61.8
Telugu	74.6
Cantonese	40.7
Old Church Slavonic	50.3
Karelian	70.6
Upper Sorbian	74.1
South Levantine Arabic	70.1
Komi Zyrian	44.7
Irish	69.5
Nayini	53.8
Munduruku	28.1
Manx	47.4
Skolt Sami	42.0
Afrikaans	74.7
Old Turkish	38.0
Tupinamba	37.4
Belarusian	84.5
Serbian	80.8
Moksha	47.7
Western Armenian	68.7
Scottish Gaelic	67.4
Khunsari	50.0
Hebrew	86.5
Uyghur	68.9
Chukchi	36.8

📄 License

The model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご