ai-detector開源AI生成內容檢測模型 - 精準識別AI生成的文本

首頁

Ai Detector

由SuperAnnotate開發

基於RoBERTa Large微調的生成文本檢測模型，用於識別AI生成內容

文本分類

Transformers

英語開源協議:其他 #生成文本檢測 #多模型覆蓋 #教育防作弊

下載量 2,160

發布時間 : 9/25/2024

模型概述

該模型專為檢測生成/合成文本而設計，對訓練數據篩選、識別科學與教育領域的欺詐作弊行為具有關鍵意義。

模型特點

平衡訓練數據

使用4.4萬組均衡樣本訓練，包含人類文本與14種LLM生成內容

多領域覆蓋

訓練數據涵蓋維基百科、Reddit問答和科研論文三大領域

抗過擬合設計

通過卡方檢驗移除關鍵n-gram，確保模型學習真實特徵而非表面模式

良好校準性

優化損失函數與標籤平滑處理，使預測置信度與實際準確率匹配

模型能力

檢測AI生成文本

識別大語言模型內容

區分人類寫作與機器生成

使用案例

教育領域

學術誠信檢測

識別學生作業中的AI生成內容

可檢測GPT-4生成文本準確率達98.5%

數據篩選

訓練數據淨化

過濾數據集中的合成文本

對LLaMA-Chat生成內容檢測準確率98%

🚀 SuperAnnotate - AI Detector

SuperAnnotate 的 AI Detector 是基於 RoBERTa Large 微調的模型，旨在檢測生成或合成文本。這一功能對於確定文本作者、檢測欺詐和作弊行為至關重要，在科研和教育領域有重要應用價值。

🚀 快速開始

前置要求

安裝 generated_text_detector，運行以下命令：

pip install git+https://github.com/superannotateai/generated_text_detector.git@v1.1.0

✨ 主要特性

能夠檢測生成或合成文本，對於確定文本作者、檢測欺詐和作弊行為至關重要。
適用於訓練數據、科研和教育領域。
基於預訓練的 RoBERTa 進行微調，具有較高的準確性。

📦 安裝指南

運行以下命令安裝 generated_text_detector：

pip install git+https://github.com/superannotateai/generated_text_detector.git@v1.1.0

💻 使用示例

基礎用法

from generated_text_detector.utils.model.roberta_classifier import RobertaClassifier
from generated_text_detector.utils.preprocessing import preprocessing_text
from transformers import AutoTokenizer
import torch.nn.functional as F


model = RobertaClassifier.from_pretrained("SuperAnnotate/ai-detector")
tokenizer = AutoTokenizer.from_pretrained("SuperAnnotate/ai-detector")

model.eval()

text_example = "It's not uncommon for people to develop allergies or intolerances to certain foods as they get older. It's possible that you have always had a sensitivity to lactose (the sugar found in milk and other dairy products), but it only recently became a problem for you. This can happen because our bodies can change over time and become more or less able to tolerate certain things. It's also possible that you have developed an allergy or intolerance to something else that is causing your symptoms, such as a food additive or preservative. In any case, it's important to talk to a doctor if you are experiencing new allergy or intolerance symptoms, so they can help determine the cause and recommend treatment."

text_example = preprocessing_text(text_example)

tokens = tokenizer.encode_plus(
   text_example,
   add_special_tokens=True,
   max_length=512,
   padding='longest',
   truncation=True,
   return_token_type_ids=True,
   return_tensors="pt"
)

_, logits = model(**tokens)

proba = F.sigmoid(logits).squeeze(1).item()

print(proba)

高級用法

from generated_text_detector.utils.text_detector import GeneratedTextDetector


detector = GeneratedTextDetector(
    "SuperAnnotate/ai-detector",
    device="cuda",
    preprocessing=True
)

text_example = "It's not uncommon for people to develop allergies or intolerances to certain foods as they get older. It's possible that you have always had a sensitivity to lactose (the sugar found in milk and other dairy products), but it only recently became a problem for you. This can happen because our bodies can change over time and become more or less able to tolerate certain things. It's also possible that you have developed an allergy or intolerance to something else that is causing your symptoms, such as a food additive or preservative. In any case, it's important to talk to a doctor if you are experiencing new allergy or intolerance symptoms, so they can help determine the cause and recommend treatment."

res = detector.detect_report(text_example)

print(res)

📚 詳細文檔

模型詳情

模型描述

屬性	詳情
模型類型	基於預訓練的 RoBERTa 進行二分類的自定義架構，具有單個輸出標籤。
語言	主要為英語。
許可證	SAIPL
微調模型	RoBERTa Large

模型來源

倉庫：GitHub 提供 HTTP 服務。

訓練數據

此版本的訓練數據集包含 44k 對文本 - 標籤樣本，平均分為兩部分：

自定義生成：數據集的前半部分使用自定義的特殊設計提示生成，人類版本源自三個領域：
- 維基百科
- Reddit ELI5 QA
- 科學論文（擴展到包含各部分的全文）
文本由四個主要大語言模型家族（GPT、LLaMA、Anthropic 和 Mistral）的 14 種不同模型生成。每個樣本由一個單一提示與一個人工編寫的響應和一個生成的響應配對組成，但提示不包含在訓練輸入中。
RAID 訓練數據分層子集：後半部分是從 RAID 訓練數據集中精心選擇的分層子集，確保在領域、模型類型和攻擊方法上具有平等的代表性。每個示例將人工編寫的文本與相應的機器生成響應（由具有特定參數和攻擊的單個模型生成）配對。

這種平衡的數據集結構保持了人工和生成文本樣本的大致相等比例，確保每個提示都與一個真實答案和一個生成答案對齊。

⚠️ 重要提示

此外，利用卡方檢驗識別出與目標標籤相關性最高的關鍵 n - 元組（n 範圍從 2 到 5），並隨後從訓練數據中移除。

特點

在訓練過程中，我們的優先事項之一不僅是最大化預測質量，還包括避免過擬合併獲得一個具有足夠置信度的預測器。我們很高興實現了以下模型校準狀態和高精度預測。

訓練詳情

選擇自定義架構是因為它能夠執行二分類，同時提供單一模型輸出，並且其損失函數中集成了可定製的平滑設置。

訓練參數：

基礎模型：FacebookAI/roberta-large
訓練輪數：20
學習率：5e - 05
權重衰減：0.0033
標籤平滑：0.38
熱身輪數：2
優化器：SGD
梯度裁剪：3.0
調度器：帶硬重啟的餘弦調度器
調度器週期數：6

性能

該解決方案已在 RAID 訓練數據集的分層子集中進行了驗證。此基準測試包含一個多樣化的數據集，涵蓋：

11 種大語言模型
11 種對抗攻擊
8 個領域

檢測器的性能如下：

模型	準確率
人類	0.731
ChatGPT	0.992
GPT - 2	0.649
GPT - 3	0.945
GPT - 4	0.985
LLaMA - Chat	0.980
Mistral	0.644
Mistral - Chat	0.975
Cohere	0.823
Cohere - Chat	0.906
MPT	0.757
MPT - Chat	0.943
平均	0.852