Comment-Moderation開源內容審核系統 - 精準檢測分類評論有害信息

首頁

Comment Moderation

由Vrandan開發

基於DistilBERT架構構建的多標籤內容審核系統，用於檢測和分類用戶評論中的潛在有害內容，具有高精度和輕量級特點。

文本分類

Transformers

英語開源協議:Openrail #多標籤內容審核 #輕量級部署 #邊緣設備優化

下載量 45.47k

發布時間 : 1/23/2025

模型概述

該模型是一個高效的內容審核工具，能夠識別9種不同類型的潛在有害內容，適合在邊緣設備和移動應用上部署。

模型特點

高準確率

模型在文本審核任務上達到95.4%的準確率。

多標籤分類

能夠同時識別9種不同類型的有害內容。

輕量級部署

67M參數的緊湊尺寸，適合在邊緣設備和移動應用上部署。

低延遲推理

優化的架構實現快速響應，適合即時內容分析。

消費級硬件訓練

模型在NVIDIA RTX 3080上訓練，證明可在消費級硬件上開發。

模型能力

文本分類

內容審核

評論審核

有害內容檢測

多標籤分類

使用案例

社交媒體審核

用戶評論審核

自動檢測社交媒體平臺上的用戶評論是否包含有害內容

可識別9種不同類型的有害內容，準確率95.4%

在線社區管理

論壇內容過濾

自動過濾論壇中的不當言論

減少人工審核工作量，提高社區內容質量

應用集成

移動應用內容審核

集成到移動應用中即時審核用戶生成內容

輕量級模型適合移動端部署，低延遲不影響用戶體驗

🚀 🛡️ 評論審核模型

這是一個強大的多標籤內容審核系統，基於 DistilBERT 架構構建，旨在高精度地檢測和分類用戶生成評論中的潛在有害內容。基於提供的文本審核數據集，該模型在性能方面目前表現最佳。此外，它的佔用空間最小，非常適合在邊緣設備上部署。目前，它是 Hugging Face 上唯一經過訓練，在保持相對於訓練數據的最小規模的同時，還能實現如此高性能的模型。

🚀 快速開始

本地Python實現

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Initialize model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Vrandan/Comment-Moderation")
tokenizer = AutoTokenizer.from_pretrained("Vrandan/Comment-Moderation")

def analyze_text(text):
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    probabilities = outputs.logits.softmax(dim=-1).squeeze()
    
    # Get predictions
    labels = [model.config.id2label[i] for i in range(len(probabilities))]
    predictions = sorted(zip(labels, probabilities), key=lambda x: x[1], reverse=True)
    
    return predictions

# Example usage
text = "Your text here"
results = analyze_text(text)
for label, prob in results:
    print(f"{label}: {prob:.4f}")

示例輸出：

Label: OK - Probability: 0.9840
Label: H - Probability: 0.0043
Label: SH - Probability: 0.0039
Label: V - Probability: 0.0019
Label: S - Probability: 0.0018
Label: HR - Probability: 0.0015
Label: V2 - Probability: 0.0011
Label: S3 - Probability: 0.0010
Label: H2 - Probability: 0.0006

無服務器Python實現

import requests

API_URL = "https://api-inference.huggingface.co/models/Vrandan/Comment-Moderation"
headers = {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()
    
output = query({
    "inputs": "Your text here",
})

Node.js JavaScript實現

require('dotenv').config();
const { HfInference } = require('@huggingface/inference');
const readline = require('readline');

// Initialize the Hugging Face client
// To use this, follow these steps:
// 1. Create a `.env` file in the root directory of your project.
// 2. Visit https://huggingface.co/settings/tokens to generate your access token (you may need to create an account if you haven't already).
// 3. Add the token to your `.env` file like this:
//    HUGGING_FACE_ACCESS_TOKEN=your_token_here
// 4. Install dotenv & huggingface/inference package (`npm install dotenv` & `npm install @huggingface/inference`) and load it in your project.
const hf = new HfInference(process.env.HUGGING_FACE_ACCESS_TOKEN);

// Create readline interface
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

async function analyzeText(text) {
    try {
        const result = await hf.textClassification({
            model: 'Vrandan/Comment-Moderation',
            inputs: text
        });
        
        console.log('\nResults:');
        result.forEach(pred => {
            console.log(`Label: ${pred.label} - Probability: ${pred.score.toFixed(4)}`);
        });
    } catch (error) {
        console.error('Error analyzing text:', error.message);
    }
}

async function main() {
    while (true) {
        try {
            const text = await new Promise(resolve => {
                rl.question('\nEnter text to analyze (or "quit" to exit): ', resolve);
            });

            if (text.toLowerCase() === 'quit') break;
            if (text.trim()) await analyzeText(text);
        } catch (error) {
            console.error('Error:', error.message);
        }
    }
    rl.close();
}

main().catch(console.error);

無服務器JavaScript實現

async function query(data) {
    const response = await fetch(
        "https://api-inference.huggingface.co/models/Vrandan/Comment-Moderation",
        {
            headers: {
                Authorization: "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
                "Content-Type": "application/json",
            },
            method: "POST",
            body: JSON.stringify(data),
        }
    );
    const result = await response.json();
    return result;
}

query({"inputs": "Your text here"}).then((response) => {
    console.log(JSON.stringify(response));
});

✨ 主要特性

多標籤分類
即時內容分析
95.4%的準確率
9種不同的內容類別
可通過API或本地實現輕鬆集成
輕量級部署佔用空間
適用於 邊緣設備和移動應用程序
低延遲推理
在保持高精度的同時節省資源
可以在消費級硬件上運行

📊 內容類別

該模型識別以下類型的潛在有害內容：

類別	標籤	定義
色情	`S`	旨在引起性興奮的內容，如對性行為的描述，或推廣性服務（不包括性教育和健康內容）。
仇恨	`H`	基於種族、性別、民族、宗教、國籍、性取向、殘疾狀況或種姓表達、煽動或宣揚仇恨的內容。
暴力	`V`	宣揚或美化暴力，或慶祝他人痛苦或羞辱的內容。
騷擾	`HR`	可能用於在現實生活中折磨或騷擾個人，或增加騷擾發生可能性的內容。
自殘	`SH`	宣揚、鼓勵或描繪自殘行為的內容，如自殺、割傷和飲食失調。
針對未成年人的色情內容	`S3`	包含18歲以下個人的色情內容。
仇恨/威脅	`H2`	包含對目標群體的暴力或嚴重傷害的仇恨內容。
暴力/血腥	`V2`	以極端詳細的圖形方式描繪死亡、暴力或嚴重身體傷害的暴力內容。
安全內容	`OK`	不違反任何準則的適當內容。

📈 性能指標

準確率: 95.4%
平均ROC AUC: 0.912
宏F1分數: 0.407
微F1分數: 0.802

查看詳細性能指標

📊 詳細模型性能

該模型已使用標準分類指標進行了廣泛評估：

損失： 0.641
準確率： 0.954 (95.4%)
宏F1分數： 0.407
微F1分數： 0.802
加權F1分數： 0.763
宏精度： 0.653
微精度： 0.875
加權精度： 0.838
宏召回率： 0.349
微召回率： 0.740
加權召回率： 0.740
平均ROC AUC： 0.912

🔧 技術細節

該模型在家庭環境中的 NVIDIA RTX 3080 GPU 上進行訓練，這表明使用消費級硬件也可以開發出有效的內容審核模型。這使得模型開發過程對個人開發者和小型組織來說更加容易實現。

關鍵訓練規格：

硬件：NVIDIA RTX 3080
基礎模型：DistilBERT
模型大小：6700萬個參數（針對高效部署進行了優化）
訓練環境：本地工作站
訓練類型：微調

儘管模型相對緊湊（6700萬個參數），但它實現了令人印象深刻的性能指標，適用於在各種設備和環境中部署。該模型的效率與性能比表明，無需大量計算資源也可以實現有效的內容審核。

⚠️ 重要提示

道德使用

定期進行偏差監測
實現時考慮上下文
以隱私為先

侷限性

可能會遺漏上下文細微差別
存在誤報的可能性
受文化背景差異影響

📚 數據集信息

該模型基於OpenAI發佈的數據集進行訓練，相關內容在他們的論文 "A Holistic Approach to Undesired Content Detection" 中有所描述。

數據集來源

📄 原始論文 (PDF)
💾 數據集倉庫

引用

如果您在研究中使用此模型或數據集，請引用：

@article{openai2022moderation,
  title={A Holistic Approach to Undesired Content Detection},
  author={Todor Markov and Chong Zhang and Sandhini Agarwal and Tyna Eloundou and Teddy Lee and Steven Adler and Angela Jiang and Lilian Weng},
  journal={arXiv preprint arXiv:2208.03274},
  year={2022}
}