bert-election2020-twitter-stance-biden-KE-MLM開源模型 - 精準檢測2020美大選推特拜登立場

首頁

Bert Election2020 Twitter Stance Biden KE MLM

由kornosk開發

這是一個基於BERT-base架構的預訓練語言模型，專門針對2020年美國大選期間關於喬·拜登的推特立場檢測任務進行優化。

文本分類英語開源協議:Gpl-3.0 #政治立場檢測 #推特文本分析 #知識增強預訓練

下載量 69

發布時間 : 3/2/2022

模型概述

該模型通過知識增強的掩碼語言模型(KE-MLM)方法預訓練，並在標註的推特數據集上微調，用於檢測對喬·拜登的支持、反對或中立立場。

模型特點

知識增強預訓練

採用知識增強的掩碼語言模型(KE-MLM)方法進行預訓練，提高了立場檢測的準確性

專業領域優化

專門針對2020年美國大選政治推文進行優化，在政治立場檢測任務上表現優異

三分類架構

能夠識別支持、反對和中立三種不同的立場類別

模型能力

文本分類

立場檢測

政治文本分析

社交媒體內容分析

使用案例

政治分析

候選人支持度分析

分析社交媒體上對喬·拜登的支持、反對和中立態度分佈

可量化評估候選人在社交媒體上的受歡迎程度

輿論監測

即時監測社交媒體上關於政治人物的輿論傾向變化

幫助政治團隊及時調整競選策略

學術研究

政治傳播研究

用於研究政治信息在社交媒體上的傳播模式和效果

為政治傳播學提供數據支持

🚀 2020年美國推特大選針對拜登立場檢測的預訓練BERT模型（KE - MLM）

本項目提供了用於立場檢測的知識增強掩碼語言模型（NAACL 2021）中KE - MLM模型的預訓練權重。該模型可用於檢測針對喬·拜登的立場，具有重要的政治分析價值。

✨ 主要特性

基於超500萬條關於2020年美國總統大選的英文推文進行預訓練。
利用立場標註數據針對喬·拜登的立場檢測進行微調。
以BERT - base為基礎初始化，通過正常的MLM目標進行訓練，並針對喬·拜登的立場檢測對分類層進行微調。

📦 安裝指南

文檔未提及具體安裝步驟，可參考官方倉庫獲取安裝相關信息。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np

# choose GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# select mode path here
pretrained_LM_path = "kornosk/bert-election2020-twitter-stance-biden-KE-MLM"

# load model
tokenizer = AutoTokenizer.from_pretrained(pretrained_LM_path)
model = AutoModelForSequenceClassification.from_pretrained(pretrained_LM_path)

id2label = {
    0: "AGAINST",
    1: "FAVOR",
    2: "NONE"
}

##### Prediction Neutral #####
sentence = "Hello World."
inputs = tokenizer(sentence.lower(), return_tensors="pt")
outputs = model(**inputs)
predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()

print("Sentence:", sentence)
print("Prediction:", id2label[np.argmax(predicted_probability)])
print("Against:", predicted_probability[0])
print("Favor:", predicted_probability[1])
print("Neutral:", predicted_probability[2])

##### Prediction Favor #####
sentence = "Go Go Biden!!!"
inputs = tokenizer(sentence.lower(), return_tensors="pt")
outputs = model(**inputs)
predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()

print("Sentence:", sentence)
print("Prediction:", id2label[np.argmax(predicted_probability)])
print("Against:", predicted_probability[0])
print("Favor:", predicted_probability[1])
print("Neutral:", predicted_probability[2])

##### Prediction Against #####
sentence = "Biden is the worst."
inputs = tokenizer(sentence.lower(), return_tensors="pt")
outputs = model(**inputs)
predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()

print("Sentence:", sentence)
print("Prediction:", id2label[np.argmax(predicted_probability)])
print("Against:", predicted_probability[0])
print("Favor:", predicted_probability[1])
print("Neutral:", predicted_probability[2])

# please consider citing our paper if you feel this is useful :)

📚 詳細文檔

此預訓練語言模型針對喬·拜登的立場檢測任務進行了微調。更多詳細信息請參考官方倉庫。

🔧 技術細節

訓練數據

該模型在超過500萬條關於2020年美國總統大選的英文推文上進行預訓練，然後使用我們的立場標註數據針對喬·拜登的立場檢測進行微調。

訓練目標

模型以BERT - base為基礎進行初始化，使用正常的MLM目標進行訓練，並針對喬·拜登的立場檢測對分類層進行微調。

📄 許可證

本項目採用GPL - 3.0許可證。

📖 參考資料

用於立場檢測的知識增強掩碼語言模型，NAACL 2021。

📚 引用格式

@inproceedings{kawintiranon2021knowledge,
    title={Knowledge Enhanced Masked Language Model for Stance Detection},
    author={Kawintiranon, Kornraphop and Singh, Lisa},
    booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
    year={2021},
    publisher={Association for Computational Linguistics},
    url={https://www.aclweb.org/anthology/2021.naacl-main.376}
}