robertuito-pos Open-source Part-of-Speech Tagging Model - Free and Accurate Tagging for Spanish/English Twitter Texts

Home

Robertuito Pos

Developed by pysentimiento

Spanish/English POS tagging model based on RoBERTuito, optimized for Twitter text

Sequence Labeling

Transformers

Spanish#Spanish POS tagging #Twitter text processing #Code-switching task

Downloads 188

Release Time : 7/17/2022

Model Overview

This model is specifically designed for POS tagging tasks on Spanish tweets, trained on the RoBERTuito architecture, supporting POS analysis of Spanish-English mixed texts

Model Features

Twitter text optimization

Specifically trained for social media (Twitter) text, effectively handling informal language and internet slang

Bilingual support

Supports POS tagging for Spanish-English mixed texts

High performance

Achieves 97.2 POS tagging accuracy in the LinCE benchmark

Model Capabilities

POS tagging

Social media text processing

Bilingual mixed-text analysis

Use Cases

Social media analysis

Twitter content analysis

Perform POS tagging on Spanish tweets to support subsequent sentiment analysis or content classification

Accurately identifies POS in informal texts

Linguistic research

Code-switching research

Analyze linguistic features of Spanish-English mixed texts

Provides accurate POS tagging support

🚀 POS Tagging model for Spanish/English

This project presents a Part - of - Speech (POS) tagging model, robertuito - pos, which addresses the POS tagging task for Spanish and English. It leverages the power of pre - trained models and specific corpora to achieve high - quality tagging results.

🚀 Quick Start

The model robertuito - pos is trained with the Spanish/English split of the LinCE NER corpus, a code - switched benchmark. The base model is RoBERTuito, a RoBERTa model trained on Spanish tweets.

Repository: https://github.com/pysentimiento/pysentimiento/

✨ Features

Trained on a code - switched benchmark corpus, suitable for Spanish and English POS tagging.
Based on the pre - trained RoBERTuito model, which has good performance on Spanish tweets.

📦 Installation

The installation process is not explicitly provided in the original document. If you want to use this model, it is recommended to refer to the pysentimiento library's official documentation for installation instructions.

💻 Usage Examples

Basic Usage

from pysentimiento import create_analyzer

pos_analyzer = create_analyzer("pos", lang="es")

pos_analyzer.predict("Quiero que esto funcione correctamente! @perezjotaeme")
 
 
>[{'type': 'PROPN', 'text': 'Quiero', 'start': 0, 'end': 6},
> {'type': 'SCONJ', 'text': 'que', 'start': 7, 'end': 10},
> {'type': 'PRON', 'text': 'esto', 'start': 11, 'end': 15},
> {'type': 'VERB', 'text': 'funcione', 'start': 16, 'end': 24},
> {'type': 'ADV', 'text': 'correctamente', 'start': 25, 'end': 38},
> {'type': 'PUNCT', 'text': '!', 'start': 38, 'end': 39},
> {'type': 'NOUN', 'text': '@perezjotaeme', 'start': 40, 'end': 53}]

⚠️ Important Note

If you want to use this model, we suggest you use it directly from the pysentimiento library as it is not working properly with the pipeline due to tokenization issues.

📚 Documentation

Results

Results are taken from the LinCE leaderboard

Property	Details
Model Type	POS Tagging model
Training Data	Spanish/English split of the LinCE NER corpus

Model	Sentiment	NER	POS
RoBERTuito	60.6	68.5	97.2
XLM Large	--	69.5	97.2
XLM Base	--	64.9	97.0
C2S mBERT	59.1	64.6	96.9
mBERT	56.4	64.0	97.1
BERT	58.4	61.1	96.9
BETO	56.5	--	--

📄 License

The license information is not provided in the original document.

📚 Citation

If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers:

@misc{perez2021pysentimiento,
      title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
      author={Juan Manuel PÃ©rez and Juan Carlos Giudici and Franco Luque},
      year={2021},
      eprint={2106.09462},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@inproceedings{ortega2019overview,
  title={Overview of the task on irony detection in Spanish variants},
  author={Ortega-Bueno, Reynier and Rangel, Francisco and Hern{\'a}ndez Far{\i}as, D and Rosso, Paolo and Montes-y-G{\'o}mez, Manuel and Medina Pagola, Jos{\'e} E},
  booktitle={Proceedings of the Iberian languages evaluation forum (IberLEF 2019), co-located with 34th conference of the Spanish Society for natural language processing (SEPLN 2019). CEUR-WS. org},
  volume={2421},
  pages={229--256},
  year={2019}
}

@inproceedings{aguilar2020lince,
  title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},
  author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},
  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
  pages={1803--1813},
  year={2020}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご