answerdotai-ModernBERT-base-ai-detector Open-Source Model - Accurately Distinguish Between AI and Human-Written Texts

Answerdotai ModernBERT Base Ai Detector

Developed by AICodexLab

This model is a fine-tuned AI-generated text detector based on ModernBERT-base, capable of distinguishing between AI-generated text (such as ChatGPT, DeepSeek, Claude, etc.) and human-written text.

Text Classification

Transformers

Open Source License:Apache-2.0 #AI Content Detection #Text Classification #ModernBERT Fine-tuning

Downloads 147

Release Time : 3/5/2025

Model Overview

A lightweight and efficient model based on ModernBERT-base, specifically designed for AI-generated content detection, suitable for text classification tasks in education, research, and other fields.

Model Features

Efficient AI Content Detection

Effectively identifies text generated by mainstream AI models such as ChatGPT, Claude, DeepSeek, etc.

Lightweight Architecture

Optimized architecture based on ModernBERT-base, reducing computational resource requirements while maintaining high performance.

High Accuracy

Achieves extremely low validation loss (0.0036) on the validation set, demonstrating excellent performance.

Model Capabilities

AI-generated Text Detection

Human Text Recognition

Text Classification

Use Cases

Education

Academic Integrity Detection

Detects whether student assignments are generated by AI

Helps educators identify potential AI-assisted writing.

Content Moderation

AI-generated Content Labeling

Labels potentially AI-generated content for platforms

Improves content transparency on platforms.

🚀 answerdotai-ModernBERT-base-ai-detector

This model is a fine - tuned version of answerdotai/ModernBERT-base on the AI vs Human Text Classification dataset, DAIGT V2 Train Dataset. It can effectively distinguish between AI - generated and human - written texts, with a validation loss of 0.0036 on the evaluation set.

🚀 Quick Start

This model is a fine - tuned version of answerdotai/ModernBERT-base on the AI vs Human Text Classification dataset, DAIGT V2 Train Dataset.

It achieves the following results on the evaluation set:

Validation Loss: 0.0036

✨ Features

📝 Model Description

This model is based on ModernBERT-base, a lightweight and efficient BERT - based model. It has been fine - tuned for AI - generated vs Human - written text classification, allowing it to distinguish between texts written by AI models (ChatGPT, DeepSeek, Claude, etc.) and human authors.

🎯 Intended Uses & Limitations

✅ Intended Uses

AI - generated content detection (e.g., ChatGPT, Claude, DeepSeek).
Text classification for distinguishing human vs AI - generated content.
Educational & Research applications for AI - content detection.

⚠️ Limitations

Not 100% accurate → Some AI texts may resemble human writing and vice versa.
Limited to trained dataset scope → May struggle with out - of - domain text.
Bias risks → If the dataset contains bias, the model may inherit it.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model_name = "answerdotai/ModernBERT-base-ai-detector"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Create text classification pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Run classification
text = "This text was written by an AI model like ChatGPT."
result = classifier(text)

print(result)

📚 Documentation

📊 Training and Evaluation Data

The model was fine - tuned on 35,894 training samples and 8,974 test samples.
The dataset consists of AI - generated text samples (ChatGPT, Claude, DeepSeek, etc.) and human - written samples (Wikipedia, books, articles).
Labels:
- 1 → AI - generated text
- 0 → Human - written text

⚙️ Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

Property	Details
Learning Rate	`2e-5`
Train Batch Size	`16`
Eval Batch Size	`16`
Optimizer	`AdamW` (`β1 = 0.9, β2 = 0.999, ε = 1e-08`)
LR Scheduler	`Linear`
Epochs	`3`
Mixed Precision	`Native AMP (fp16)`

📈 Training Results

Training Loss	Epoch	Step	Validation Loss
0.0505	0.22	500	0.0214
0.0114	0.44	1000	0.0110
0.0088	0.66	1500	0.0032
0.0	0.89	2000	0.0048
0.0068	1.11	2500	0.0035
0.0	1.33	3000	0.0040
0.0	1.55	3500	0.0097
0.0053	1.78	4000	0.0101
0.0	2.00	4500	0.0053
0.0	2.22	5000	0.0039
0.0017	2.45	5500	0.0046
0.0	2.67	6000	0.0043
0.0	2.89	6500	0.0036

🛠 Framework Versions

Property	Details
Transformers	`4.48.3`
PyTorch	`2.5.1+cu124`
Datasets	`3.3.2`
Tokenizers	`0.21.0`

📄 License

The model is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご