๐ POS Tagging model for Spanish/English
This project presents a Part - of - Speech (POS) tagging model, robertuito - pos
, which addresses the POS tagging task for Spanish and English. It leverages the power of pre - trained models and specific corpora to achieve high - quality tagging results.
๐ Quick Start
The model robertuito - pos
is trained with the Spanish/English split of the LinCE NER corpus, a code - switched benchmark. The base model is RoBERTuito, a RoBERTa model trained on Spanish tweets.
Repository: https://github.com/pysentimiento/pysentimiento/
โจ Features
- Trained on a code - switched benchmark corpus, suitable for Spanish and English POS tagging.
- Based on the pre - trained RoBERTuito model, which has good performance on Spanish tweets.
๐ฆ Installation
The installation process is not explicitly provided in the original document. If you want to use this model, it is recommended to refer to the pysentimiento
library's official documentation for installation instructions.
๐ป Usage Examples
Basic Usage
from pysentimiento import create_analyzer
pos_analyzer = create_analyzer("pos", lang="es")
pos_analyzer.predict("Quiero que esto funcione correctamente! @perezjotaeme")
>[{'type': 'PROPN', 'text': 'Quiero', 'start': 0, 'end': 6},
> {'type': 'SCONJ', 'text': 'que', 'start': 7, 'end': 10},
> {'type': 'PRON', 'text': 'esto', 'start': 11, 'end': 15},
> {'type': 'VERB', 'text': 'funcione', 'start': 16, 'end': 24},
> {'type': 'ADV', 'text': 'correctamente', 'start': 25, 'end': 38},
> {'type': 'PUNCT', 'text': '!', 'start': 38, 'end': 39},
> {'type': 'NOUN', 'text': '@perezjotaeme', 'start': 40, 'end': 53}]
โ ๏ธ Important Note
If you want to use this model, we suggest you use it directly from the pysentimiento
library as it is not working properly with the pipeline due to tokenization issues.
๐ Documentation
Results
Results are taken from the LinCE leaderboard
Property |
Details |
Model Type |
POS Tagging model |
Training Data |
Spanish/English split of the LinCE NER corpus |
Model |
Sentiment |
NER |
POS |
RoBERTuito |
60.6 |
68.5 |
97.2 |
XLM Large |
-- |
69.5 |
97.2 |
XLM Base |
-- |
64.9 |
97.0 |
C2S mBERT |
59.1 |
64.6 |
96.9 |
mBERT |
56.4 |
64.0 |
97.1 |
BERT |
58.4 |
61.1 |
96.9 |
BETO |
56.5 |
-- |
-- |
๐ License
The license information is not provided in the original document.
๐ Citation
If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers:
@misc{perez2021pysentimiento,
title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
author={Juan Manuel Pรยฉrez and Juan Carlos Giudici and Franco Luque},
year={2021},
eprint={2106.09462},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@inproceedings{ortega2019overview,
title={Overview of the task on irony detection in Spanish variants},
author={Ortega-Bueno, Reynier and Rangel, Francisco and Hern{\'a}ndez Far{\i}as, D and Rosso, Paolo and Montes-y-G{\'o}mez, Manuel and Medina Pagola, Jos{\'e} E},
booktitle={Proceedings of the Iberian languages evaluation forum (IberLEF 2019), co-located with 34th conference of the Spanish Society for natural language processing (SEPLN 2019). CEUR-WS. org},
volume={2421},
pages={229--256},
year={2019}
}
@inproceedings{aguilar2020lince,
title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},
author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},
booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
pages={1803--1813},
year={2020}
}