xlm - roberta - large - manifesto開源模型 - 免費實現零樣本文本分類，支持多語言

首頁

Xlm Roberta Large Manifesto

由poltextlab開發

基於多語言訓練數據微調的xlm-roberta-large模型，用於零樣本文本分類，採用宣言項目編碼方案。

文本分類

Transformers

其他開源協議:MIT #多語言政治文本分類 #宣言項目編碼 #零樣本分類

下載量 124

發布時間 : 8/4/2023

模型概述

該模型是基於xlm-roberta-large架構微調的多語言文本分類模型，專門用於政治文本分析，遵循宣言項目的編碼方案。

模型特點

多語言支持

模型支持多種語言的文本分類任務

宣言項目編碼方案

採用2020b版宣言項目數據集代碼手冊的標註體系

零樣本分類能力

無需特定領域訓練即可進行分類

模型能力

多語言文本分類

政治文本分析

零樣本學習

使用案例

政治文本分析

政策聲明分類

對政府政策聲明進行分類和分析

政治宣言編碼

根據宣言項目編碼方案對政治文本進行編碼

🚀 xlm-roberta-large-manifesto

本項目的xlm-roberta-large模型在使用Manifesto Project編碼方案標註的多語言訓練數據上進行了微調。它能有效應用於零樣本分類和文本分類等任務。

🚀 快速開始

模型使用

from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
pipe = pipeline(
    model="poltextlab/xlm-roberta-large-manifesto",
    task="text-classification",
    tokenizer=tokenizer,
    use_fast=False,
    token="<your_hf_read_only_token>"
)

text = "We will place an immediate 6-month halt on the finance driven closure of beds and wards, and set up an independent audit of needs and facilities."
pipe(text)

受限訪問說明

由於該模型採用受限訪問機制，在加載模型時必須傳遞token參數。在早期版本的Transformers包中，可能需要使用use_auth_token參數代替。

✨ 主要特性

多語言支持：支持多種語言的文本分類任務，可處理不同語言的輸入。
基於特定編碼方案：使用Manifesto Project的編碼方案進行標註訓練，具有特定領域的適用性。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
pipe = pipeline(
    model="poltextlab/xlm-roberta-large-manifesto",
    task="text-classification",
    tokenizer=tokenizer,
    use_fast=False,
    token="<your_hf_read_only_token>"
)

text = "We will place an immediate 6-month halt on the finance driven closure of beds and wards, and set up an independent audit of needs and facilities."
pipe(text)

📚 詳細文檔

模型描述

本模型是在多語言訓練數據上微調的xlm-roberta-large模型，使用了Manifesto Project的編碼方案進行標註。具體使用了 Version 2020b (December 23, 2020) 版本的Manifesto Project數據集代碼手冊。

模型性能

模型在305141個示例的測試集上進行了評估，測試集採用分層方式劃分，對於每個標籤，隨機選擇所有出現次數的20%。

屬性	詳情
評估指標	指標（精確率、召回率和F1分數為加權宏平均值）
具體指標	精確率：0.6495；召回率：0.6547；F1分數：0.6507；準確率：0.6547；Top3準確率：0.8505；Top5準確率：0.9073