ai-text-detector-academic-v1.01 Open-source Model - Precisely Detect AI-generated Text in Academic Scenarios

Home

Ai Text Detector Academic V1.01

Developed by desklib

A fine-tuned AI-generated text detection model based on DeBERTa-v3-large, optimized for academic scenarios

Text Classification

Transformers

EnglishOpen Source License:MIT #Academic Text Detection #Adversarial Attack Resistance #DeBERTa Architecture

Downloads 255

Release Time : 3/13/2025

Model Overview

This model is used to detect whether English text is AI-generated, specifically optimized for academic writing scenarios, suitable for maintaining academic integrity and content moderation

Model Features

Academic Scenario Optimization

Fine-tuned specifically for academic-related data, excelling in academic writing detection

Strong Robustness

Effectively resists various basic adversarial attacks

High-precision Detection

Based on DeBERTa-v3-large architecture, delivering outstanding text classification performance

Model Capabilities

AI-generated Text Detection

Academic Integrity Verification

Content Authenticity Review

Use Cases

Education Sector

Academic Paper Detection

Detect whether student-submitted papers are AI-generated

Help educational institutions maintain academic integrity

Assignment Review

Identify potential AI-generated content in student assignments

Ensure the authenticity of learning outcomes

Content Moderation

Academic Publication Review

Detect the proportion of AI-generated content in submitted papers

Maintain quality standards in academic publishing

🚀 desklib/ai-text-detector-academic-v1.01

This AI-generated text detection model, developed by Desklib, is tailored for academic data. It classifies English text as human-written or AI-generated, leveraging a fine - tuned microsoft/deberta - v3 - large architecture for high accuracy. It's robust against adversarial attacks in academic contexts and useful for academic integrity, content moderation, and ensuring scholarly writing authenticity.

🚀 Quick Start

Try the model online!: Desklib AI Detector

✨ Features

Specifically fine - tuned for academic - related data.
Classifies English text as human - written or AI - generated.
Based on a fine - tuned microsoft/deberta - v3 - large transformer architecture, achieving high accuracy.
Robust against various adversarial attacks in academic contexts.
Useful for academic integrity, content moderation, and ensuring the authenticity of scholarly writing.

📦 Installation

This model can be used with the Hugging Face transformers library. You can install the necessary libraries using the following command:

pip install transformers torch

💻 Usage Examples

Basic Usage

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoConfig, AutoModel, PreTrainedModel

class DesklibAIDetectionModel(PreTrainedModel):
    config_class = AutoConfig

    def __init__(self, config):
        super().__init__(config)
        # Initialize the base transformer model.
        self.model = AutoModel.from_config(config)
        # Define a classifier head.
        self.classifier = nn.Linear(config.hidden_size, 1)
        # Initialize weights (handled by PreTrainedModel)
        self.init_weights()

    def forward(self, input_ids, attention_mask=None, labels=None):
        # Forward pass through the transformer
        outputs = self.model(input_ids, attention_mask=attention_mask)
        last_hidden_state = outputs[0]
        # Mean pooling
        input_mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
        sum_embeddings = torch.sum(last_hidden_state * input_mask_expanded, dim=1)
        sum_mask = torch.clamp(input_mask_expanded.sum(dim=1), min=1e-9)
        pooled_output = sum_embeddings / sum_mask

        # Classifier
        logits = self.classifier(pooled_output)
        loss = None
        if labels is not None:
            loss_fct = nn.BCEWithLogitsLoss()
            loss = loss_fct(logits.view(-1), labels.float())

        output = {"logits": logits}
        if loss is not None:
            output["loss"] = loss
        return output


def predict_single_text(text, model, tokenizer, device, max_len=768, threshold=0.5):
    encoded = tokenizer(
        text,
        padding='max_length',
        truncation=True,
        max_length=max_len,
        return_tensors='pt'
    )
    input_ids = encoded['input_ids'].to(device)
    attention_mask = encoded['attention_mask'].to(device)

    model.eval()
    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs["logits"]
        probability = torch.sigmoid(logits).item()

    label = 1 if probability >= threshold else 0
    return probability, label


def main():
    # --- Model and Tokenizer Directory ---
    model_directory = "desklib/ai-text-detector-academic-v1.01"

    # --- Load tokenizer and model ---
    tokenizer = AutoTokenizer.from_pretrained(model_directory)
    model = DesklibAIDetectionModel.from_pretrained(model_directory)

    # --- Set up device ---
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    # --- Example Input text ---
    text = "AI detection refers to the process of identifying whether a given piece of content, such as text, images, or audio, has been generated by artificial intelligence. This is achieved using various machine learning techniques, including perplexity analysis, entropy measurements, linguistic pattern recognition, and neural network classifiers trained on human and AI-generated data. Advanced AI detection tools assess writing style, coherence, and statistical properties to determine the likelihood of AI involvement. These tools are widely used in academia, journalism, and content moderation to ensure originality, prevent misinformation, and maintain ethical standards. As AI-generated content becomes increasingly sophisticated, AI detection methods continue to evolve, integrating deep learning models and ensemble techniques for improved accuracy."

    # --- Run prediction ---
    probability, predicted_label = predict_single_text(text, model, tokenizer, device)
    print(f"Probability of being AI generated: {probability:.4f}")
    print(f"Predicted label: {'AI Generated' if predicted_label == 1 else 'Not AI Generated'}")

if __name__ == "__main__":
    main()

📚 Documentation

Model Architecture

The model is built upon a fine - tuned microsoft/deberta - v3 - large transformer architecture. The core components include:

Transformer Base: The pre - trained microsoft/deberta - v3 - large model serves as the foundation. This model utilizes DeBERTa (Decoding - enhanced BERT with disentangled attention), an improved version of BERT and RoBERTa, which incorporates disentangled attention and an enhanced mask decoder for better performance.
Mean Pooling: A mean pooling layer aggregates the hidden states from the transformer, creating a fixed - size representation of the input text. This method averages the token embeddings, weighted by the attention mask, to capture the overall semantic meaning.
Classifier Head: A linear layer acts as a classifier, taking the pooled representation and outputting a single logit. This logit represents the model's confidence that the input text is AI - generated. Sigmoid activation is applied to the logit to produce a probability.

Limitations

The model is fine - tuned for academic - related data and may not perform optimally on general - purpose or creative writing texts. Check out the our standard AI Detector here: https://huggingface.co/desklib/ai-text-detector-v1.01
It is not fine - tuned for advanced adversarial attacks but performs well against basic adversarial manipulations.
Since AI - generated text detection is an evolving field, the model may require periodic updates to adapt to newer AI text generation models.

📄 License

This project is licensed under the MIT License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご