deberta-v3-base-daigenc-mgt1a Open-source Model - Efficiently Implement Binary Classification Detection of Machine-generated Texts

Deberta V3 Base Daigenc Mgt1a

Developed by OU-Advacheck

This is a binary classification model for machine-generated text, which won first place in the monolingual subtask of the COLING 2025 GenAI detection task.

Text Classification

Transformers

EnglishOpen Source License:MIT #AI-generated text detection #Multi-task learning #High-precision F1

Downloads 396

Release Time : 11/13/2024

Model Overview

The model fine-tunes DeBERTa-v3-base in a multi-task mode, featuring a shared encoder and three parallel classification heads for detecting machine-generated text.

Model Features

Multi-task learning

Adopts a multi-task mode with a shared encoder and three parallel classification heads to enhance model performance.

High performance

Achieved first place in the COLING 2025 GenAI detection task, with a macro-average F1 score of 0.8307.

Custom classification head

Uses a multi-layer perceptron (MLP) as the classification head to improve model expressiveness.

Model Capabilities

Machine-generated text detection

Binary classification task

Use Cases

Content moderation

AI-generated content detection

Detects whether text is generated by AI for content moderation and quality control.

Achieved a macro-average F1 score of 0.8307 on the competition test set.

🚀 Winning model for the COLING 2025 Workshop on Detecting AI Generated Content (DAIGenC)

A binary classification model that secured first place in the COLING 2025 GenAI Detection Task

🚀 Quick Start

This is a binary classification model for machine-generated fragments that achieved first place in the monolingual subtask of the COLING 2025 GenAI Detection Task. It's a fine - tuned version of DeBERTa - v3 - base in multi - task mode, featuring a shared encoder and three parallel heads for classification. Only one head is used for inference.

✨ Features

First - Place Performance: Secured the top position in the monolingual subtask of the COLING 2025 GenAI Detection Task.
Multi - Task Design: Based on DeBERTa - v3 - base, it operates in multi - task mode with a shared encoder and parallel classification heads.

📦 Installation

The installation mainly involves setting up the transformers library. You can install it using pip:

pip install transformers torch

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, DebertaV2ForSequenceClassification

class MLayerDebertaV2ForSequenceClassification(
    DebertaV2ForSequenceClassification
):
    def __init__(self, config, **kwargs):
        super().__init__(config)
        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(config.hidden_size, 512),
            torch.nn.GELU(),
            torch.nn.Linear(512, 256),
            torch.nn.GELU(),
            torch.nn.Dropout(0.5),
            torch.nn.Linear(256, 2)
        )

tokenizer = AutoTokenizer.from_pretrained(
    "OU-Advacheck/deberta-v3-base-daigenc-mgt1a"
)
model = MLayerDebertaV2ForSequenceClassification.from_pretrained(
    "OU-Advacheck/deberta-v3-base-daigenc-mgt1a"
)
model.eval()

inputs = tokenizer(
    ['Hello, Thanks for sharing your health concern with us. I have gone through your query and here are your answers: 1. If you have regular cycles, there is no further need to use any medication to regulate cycles. 2. Establishment of regular ovulation and timing of intercourse properly is necessary. 3. If you want to conceive quickly, you have to get further evaluation and plan management. Hope this helps.',
     'He might have small intestinal TB rather than stomach TB. Amoebas also involves small intestine/some part of large intestine. If he has taken medicines for both diseases in form of a Complete Course, he should be fine. U can go for an oral+iv contrast CT scan of him. Now, the diagnosis of a lax cardiac can be confirmed by an upper GI endoscopy with manometry (if available). Lax cardiac may cause acidity with reflux.'],
    max_length=512,
    truncation=True,
    padding="max_length",
    return_tensors="pt"
)

torch.softmax(
    model(**inputs)[0], dim=1
).detach().cpu()[:, 1].tolist()

📚 Documentation

Limitations and bias

This model is restricted to a training dataset composed of generated and human - generated texts from various sources and domains over a specific period. It may not be suitable for all use cases in different domains. Additionally, the model may produce false positives in some cases, which can vary depending on the classification threshold.

Quality

Quality on the declared test set in the competition (with 0.92 probability threshold).

Model	Main Score (F1 Macro)	Auxiliary Score (F1 Micro)
MTL DeBERTa - v3 - base (our)	0.8307	0.8311
Single - task DeBERTa - v30 - base	0.7852	0.7891
baseline	0.7342	0.7343

Training procedure

This model was fine - tuned on the English version's training part of the MGT Detection Task 1 dataset. The classes are 0 - human and 1 - machine. The model was fine - tuned in two stages on a single NVIDIA RTX 3090 GPU with hyperparameters detailed in our paper.

Your Own Fine - Tune

If you wish to fine - tune this architecture on your data domains or base models, we provide our learning and running code with all instructions, which can be found on GitHub.

📄 License

This project is licensed under the MIT license.

📖 Citation

If you use the results of this model in your research, please cite our paper:

@misc{gritsai2024advacheckgenaidetectiontask,
      title={Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking}, 
      author={German Gritsai and Anastasia Voznyuk and Ildar Khabutdinov and Andrey Grabovoy},
      year={2024},
      eprint={2411.11736},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.11736}, 
}

Information Table

Property	Details
Library Name	transformers
Model Type	Binary classification model
Base Model	microsoft/deberta - v3 - base
Training Data	Jinyan1/COLING_2025_MGT_en
Metrics	f1
License	MIT

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご