robertuito-sentiment-analysis開源模型 - 免費部署實現西班牙語推文情感分類

首頁

Robertuito Sentiment Analysis

由pysentimiento開發

基於RoBERTuito的西班牙語推文情感分析模型，支持POS(積極)/NEG(消極)/NEU(中性)三類情感分類

文本分類西班牙語#西班牙語推文情感分析 #多方言覆蓋 #RoBERTa架構優化

下載量 1.0M

發布時間 : 3/2/2022

模型概述

該模型專門針對西班牙語社交媒體文本(特別是推文)進行情感分析，基於TASS 2020語料庫訓練，覆蓋多種西班牙語方言。

模型特點

方言覆蓋

訓練數據包含多種西班牙語方言變體

社交媒體優化

基於RoBERTuito模型(專門在西班牙語推文上預訓練)

輕量級部署

通過pysentimiento庫可快速集成到應用

模型能力

西班牙語文本情感分類

社交媒體文本分析

多方言情感識別

使用案例

社交媒體監測

品牌輿情分析

分析西班牙語用戶對品牌/產品的評價傾向

可識別70%以上的情感傾向(宏觀F1 0.705)

市場調研

產品反饋分析

從西班牙語用戶評論中提取產品改進建議

🚀 西班牙語情感分析

robertuito-sentiment-analysis 是一個用於西班牙語情感分析的模型。它使用 TASS 2020 語料庫（約 5000 條推文）進行訓練，涵蓋了多種西班牙語方言。該模型基於 RoBERTuito，這是一個在西班牙語推文中預訓練的 RoBERTa 模型，使用 POS、NEG、NEU 標籤進行情感分類。

🚀 快速開始

安裝

你可以直接使用 pysentimiento 庫來調用該模型。

使用示例

from pysentimiento import create_analyzer
analyzer = create_analyzer(task="sentiment", lang="es")

analyzer.predict("Qué gran jugador es Messi")
# 返回 AnalyzerOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})

✨ 主要特性

多方言支持：使用涵蓋多種西班牙語方言的 TASS 2020 語料庫進行訓練，能處理不同方言的文本。
預訓練模型：基於在西班牙語推文中預訓練的 RoBERTuito 模型，具有良好的語言理解能力。
標準標籤：使用 POS、NEG、NEU 標準標籤，便於進行情感分類。

📚 詳細文檔

模型倉庫

模型倉庫地址：https://github.com/pysentimiento/pysentimiento/

評估結果

以下是 pysentimiento 中四個任務的評估結果，結果以宏 F1 分數表示：

模型	情感分析	仇恨言論檢測	反諷檢測	情感極性分析
robertuito	0.560 ± 0.010	0.759 ± 0.007	0.739 ± 0.005	0.705 ± 0.003
roberta	0.527 ± 0.015	0.741 ± 0.012	0.721 ± 0.008	0.670 ± 0.006
bertin	0.524 ± 0.007	0.738 ± 0.007	0.713 ± 0.012	0.666 ± 0.005
beto_uncased	0.532 ± 0.012	0.727 ± 0.016	0.701 ± 0.007	0.651 ± 0.006
beto_cased	0.516 ± 0.012	0.724 ± 0.012	0.705 ± 0.009	0.662 ± 0.005
mbert_uncased	0.493 ± 0.010	0.718 ± 0.011	0.681 ± 0.010	0.617 ± 0.003
biGRU	0.264 ± 0.007	0.592 ± 0.018	0.631 ± 0.011	0.585 ± 0.011

請注意，對於仇恨言論檢測，這些是 Semeval 2019 任務 5 子任務 B 的結果。

📄 許可證

如果在你的研究中使用了該模型，請引用 pysentimiento、RoBERTuito 和 TASS 的相關論文：

@article{perez2021pysentimiento,
  title={pysentimiento: a python toolkit for opinion mining and social NLP tasks},
  author={P{\'e}rez, Juan Manuel and Rajngewerc, Mariela and Giudici, Juan Carlos and Furman, Dami{\'a}n A and Luque, Franco and Alemany, Laura Alonso and Mart{\'\i}nez, Mar{\'\i}a Vanina},
  journal={arXiv preprint arXiv:2106.09462},
  year={2021}
}

@inproceedings{perez-etal-2022-robertuito,
    title = "{R}o{BERT}uito: a pre-trained language model for social media text in {S}panish",
    author = "P{\'e}rez, Juan Manuel  and
      Furman, Dami{\'a}n Ariel  and
      Alonso Alemany, Laura  and
      Luque, Franco M.",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.785",
    pages = "7235--7243",
    abstract = "Since BERT appeared, Transformer language models and transfer learning have become state-of-the-art for natural language processing tasks. Recently, some works geared towards pre-training specially-crafted models for particular domains, such as scientific papers, medical documents, user-generated texts, among others. These domain-specific models have been shown to improve performance significantly in most tasks; however, for languages other than English, such models are not widely available. In this work, we present RoBERTuito, a pre-trained language model for user-generated text in Spanish, trained on over 500 million tweets. Experiments on a benchmark of tasks involving user-generated text showed that RoBERTuito outperformed other pre-trained language models in Spanish. In addition to this, our model has some cross-lingual abilities, achieving top results for English-Spanish tasks of the Linguistic Code-Switching Evaluation benchmark (LinCE) and also competitive performance against monolingual models in English Twitter tasks. To facilitate further research, we make RoBERTuito publicly available at the HuggingFace model hub together with the dataset used to pre-train it.",
}

@inproceedings{garcia2020overview,
  title={Overview of TASS 2020: Introducing emotion detection},
  author={Garc{\'\i}a-Vega, Manuel and D{\'\i}az-Galiano, MC and Garc{\'\i}a-Cumbreras, MA and Del Arco, FMP and Montejo-R{\'a}ez, A and Jim{\'e}nez-Zafra, SM and Mart{\'\i}nez C{\'a}mara, E and Aguilar, CA and Cabezudo, MAS and Chiruzzo, L and others},
  booktitle={Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) Co-Located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), M{\'a}laga, Spain},
  pages={163--170},
  year={2020}
}