xlm - roberta - base - ft - udpos28 - ro Open - source Model - Multilingual Part - of - Speech Tagging, the Top Choice for Romanian Optimization

Xlm Roberta Base Ft Udpos28 Ro

Developed by wietsedv

A multilingual POS tagging model based on XLM-RoBERTa, fine-tuned on Universal Dependencies v2.8 dataset with special optimization for Romanian

Sequence Labeling

Transformers

OtherOpen Source License:Apache-2.0 #Multilingual POS tagging #Romanian language optimization #High-accuracy POS

Downloads 14

Release Time : 3/2/2022

Model Overview

This model is a multilingual POS tagging model based on XLM-RoBERTa architecture, fine-tuned on Universal Dependencies v2.8 dataset. Specially optimized for Romanian, achieving 96.8% accuracy in Romanian language tests.

Model Features

Multilingual support

Supports POS tagging tasks for multiple languages including Romanian

High accuracy

Achieves 96.8% accuracy in Romanian language tests with excellent performance

Based on Universal Dependencies dataset

Fine-tuned on Universal Dependencies v2.8 dataset with broad language coverage

Model Capabilities

POS tagging

Multilingual text processing

Token classification

Use Cases

Natural Language Processing

Romanian text analysis

Performs POS tagging on Romanian text

96.8% accuracy

Multilingual text processing

Supports POS tagging tasks for multiple languages

See performance metrics for accuracy in each language

🚀 XLM-RoBERTa base Universal Dependencies v2.8 POS tagging: Romanian

This model addresses the task of part - of - speech tagging across multiple languages. It is a significant contribution in the field of cross - lingual transfer, providing evidence from POS tagging with over 100 languages. Check the Space for more details.

🚀 Quick Start

This model is part of the paper named "Make the Best of Cross - lingual Transfer: Evidence from POS Tagging with over 100 Languages". You can access more details by visiting the Space.

✨ Features

Multilingual Support: Capable of performing part - of - speech tagging on over 100 languages.
Cross - lingual Transfer: Demonstrates effective cross - lingual transfer in POS tagging tasks.

📦 Installation

There is no specific installation command provided in the original README. So, this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-ro")
model = AutoModelForTokenClassification.from_pretrained("wietsedv/xlm-roberta-base-ft-udpos28-ro")

Advanced Usage

There is no advanced usage code example provided in the original README. So, this part is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	xlm - roberta - base - ft - udpos28 - ro
Training Data	Universal Dependencies v2.8

Metrics

The following table shows the accuracy metrics of the model on different languages:

Language	Test Accuracy
English	88.4
Dutch	86.1
German	87.3
Italian	88.2
French	91.3
Spanish	91.1
Russian	90.4
Swedish	90.7
Norwegian	85.0
Danish	91.0
Low Saxon	56.2
Akkadian	41.8
Armenian	88.4
Welsh	71.7
Old East Slavic	78.7
Albanian	90.2
Slovenian	80.3
Guajajara	39.3
Kurmanji	79.5
Turkish	79.5
Finnish	86.0
Indonesian	84.2
Ukrainian	89.7
Polish	89.5
Portuguese	90.3
Kazakh	85.0
Latin	81.8
Old French	65.7
Buryat	64.9
Kaapor	27.1
Korean	64.3
Estonian	87.5
Croatian	89.7
Gothic	35.1
Swiss German	55.5
Assyrian	16.8
North Sami	45.0
Naija	43.8
Latvian	89.5
Chinese	54.9
Tagalog	74.0
Bambara	32.9
Lithuanian	87.7
Galician	89.9
Vietnamese	66.2
Greek	88.9
Catalan	90.0
Czech	89.8
Erzya	51.5
Bhojpuri	55.0
Thai	64.9
Marathi	87.1
Basque	80.7
Slovak	89.8
Kiche	42.4
Yoruba	30.3
Warlpiri	46.2
Tamil	82.5
Maltese	38.3
Ancient Greek	67.8
Icelandic	85.1
Mbya Guarani	34.4
Urdu	63.4
Romanian	96.8
Persian	79.0
Apurina	43.1
Japanese	43.7
Hungarian	79.9
Hindi	70.6
Classical Chinese	40.8
Komi Permyak	57.2
Faroese	80.9
Sanskrit	40.4
Livvi	66.9
Arabic	83.5
Wolof	43.1
Bulgarian	91.2
Akuntsu	40.6
Makurap	20.5
Kangri	53.7
Breton	68.7
Telugu	82.9
Cantonese	57.0
Old Church Slavonic	59.1
Karelian	75.0
Upper Sorbian	77.8
South Levantine Arabic	71.2
Komi Zyrian	47.0
Irish	69.4
Nayini	56.4
Munduruku	29.2
Manx	38.8
Skolt Sami	43.7
Afrikaans	88.2
Old Turkish	37.1
Tupinamba	44.5
Belarusian	90.4
Serbian	89.5
Moksha	49.1
Western Armenian	82.0
Scottish Gaelic	63.1
Khunsari	47.3
Hebrew	88.5
Uyghur	78.0
Chukchi	37.5

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご