🚀 英文句子偏見檢測模型
該模型是一個英文序列分類模型,可檢測句子(新聞文章)中的偏見和公平性,助力更客觀的文本分析。
🚀 快速開始
此英文序列分類模型基於MBAD數據集進行訓練,用於檢測句子(新聞文章)中的偏見和公平性。該模型在distilbert - base - uncased模型的基礎上構建,訓練30個週期,批量大小為16,學習率為5e - 5,最大序列長度為512。
示例展示
以下是一些示例句子及其標籤:
- 有偏見示例
- 示例1:“Nevertheless, Trump and other Republicans have tarred the protests as havens for terrorists intent on destroying property.”
- 示例2:“Billie Eilish issues apology for mouthing an anti - Asian derogatory term in a resurfaced video.”
- 示例3:“Christians should make clear that the perpetuation of objectionable vaccines and the lack of alternatives is a kind of coercion.”
- 無偏見示例
- 示例1:“There have been a protest by a group of people”
- 示例2:“While emphasizing he’s not singling out either party, Cohen warned about the danger of normalizing white supremacist ideology.”
模型信息
屬性 |
詳情 |
模型類型 |
英文序列分類模型 |
訓練數據 |
MBAD Data |
碳排放 |
0.319355 Kg |
模型表現
訓練準確率 |
驗證準確率 |
訓練損失 |
測試損失 |
76.97 |
62.00 |
0.45 |
0.96 |
💻 使用示例
基礎用法
使用該模型最簡單的方法是從Hugging Face加載推理API,另一種方法是通過transformers庫提供的pipeline對象。
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("d4data/bias-detection-model")
model = TFAutoModelForSequenceClassification.from_pretrained("d4data/bias-detection-model")
classifier = pipeline('text-classification', model=model, tokenizer=tokenizer)
classifier("The irony, of course, is that the exhibit that invites people to throw trash at vacuuming Ivanka Trump lookalike reflects every stereotype feminists claim to stand against, oversexualizing Ivanka’s body and ignoring her hard work.")
📄 許可證
此模型是由Deepak John Reji和Shaina Raza開展的研究課題 “Bias and Fairness in AI” 的一部分。如果您使用此作品(代碼、模型或數據集),請在以下鏈接處點贊:
Bias & Fairness in AI, (2022), GitHub repository, https://github.com/dreji18/Fairness-in-AI