đ Winning model for the COLING 2025 Workshop on Detecting AI Generated Content (DAIGenC)
A binary classification model that secured first place in the COLING 2025 GenAI Detection Task
đ Quick Start
This is a binary classification model for machine-generated fragments that achieved first place in the monolingual subtask of the COLING 2025 GenAI Detection Task. It's a fine - tuned version of DeBERTa - v3 - base in multi - task mode, featuring a shared encoder and three parallel heads for classification. Only one head is used for inference.
⨠Features
- First - Place Performance: Secured the top position in the monolingual subtask of the COLING 2025 GenAI Detection Task.
- Multi - Task Design: Based on DeBERTa - v3 - base, it operates in multi - task mode with a shared encoder and parallel classification heads.
đĻ Installation
The installation mainly involves setting up the transformers
library. You can install it using pip
:
pip install transformers torch
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, DebertaV2ForSequenceClassification
class MLayerDebertaV2ForSequenceClassification(
DebertaV2ForSequenceClassification
):
def __init__(self, config, **kwargs):
super().__init__(config)
self.classifier = torch.nn.Sequential(
torch.nn.Linear(config.hidden_size, 512),
torch.nn.GELU(),
torch.nn.Linear(512, 256),
torch.nn.GELU(),
torch.nn.Dropout(0.5),
torch.nn.Linear(256, 2)
)
tokenizer = AutoTokenizer.from_pretrained(
"OU-Advacheck/deberta-v3-base-daigenc-mgt1a"
)
model = MLayerDebertaV2ForSequenceClassification.from_pretrained(
"OU-Advacheck/deberta-v3-base-daigenc-mgt1a"
)
model.eval()
inputs = tokenizer(
['Hello, Thanks for sharing your health concern with us. I have gone through your query and here are your answers: 1. If you have regular cycles, there is no further need to use any medication to regulate cycles. 2. Establishment of regular ovulation and timing of intercourse properly is necessary. 3. If you want to conceive quickly, you have to get further evaluation and plan management. Hope this helps.',
'He might have small intestinal TB rather than stomach TB. Amoebas also involves small intestine/some part of large intestine. If he has taken medicines for both diseases in form of a Complete Course, he should be fine. U can go for an oral+iv contrast CT scan of him. Now, the diagnosis of a lax cardiac can be confirmed by an upper GI endoscopy with manometry (if available). Lax cardiac may cause acidity with reflux.'],
max_length=512,
truncation=True,
padding="max_length",
return_tensors="pt"
)
torch.softmax(
model(**inputs)[0], dim=1
).detach().cpu()[:, 1].tolist()
đ Documentation
Limitations and bias
This model is restricted to a training dataset composed of generated and human - generated texts from various sources and domains over a specific period. It may not be suitable for all use cases in different domains. Additionally, the model may produce false positives in some cases, which can vary depending on the classification threshold.
Quality
Quality on the declared test set in the competition (with 0.92 probability threshold).
Model |
Main Score (F1 Macro) |
Auxiliary Score (F1 Micro) |
MTL DeBERTa - v3 - base (our) |
0.8307 |
0.8311 |
Single - task DeBERTa - v30 - base |
0.7852 |
0.7891 |
baseline |
0.7342 |
0.7343 |
Training procedure
This model was fine - tuned on the English version's training part of the MGT Detection Task 1 dataset. The classes are 0 - human
and 1 - machine
. The model was fine - tuned in two stages on a single NVIDIA RTX 3090 GPU with hyperparameters detailed in our paper.
Your Own Fine - Tune
If you wish to fine - tune this architecture on your data domains or base models, we provide our learning and running code with all instructions, which can be found on GitHub.
đ License
This project is licensed under the MIT license.
đ Citation
If you use the results of this model in your research, please cite our paper:
@misc{gritsai2024advacheckgenaidetectiontask,
title={Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking},
author={German Gritsai and Anastasia Voznyuk and Ildar Khabutdinov and Andrey Grabovoy},
year={2024},
eprint={2411.11736},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.11736},
}
Information Table
Property |
Details |
Library Name |
transformers |
Model Type |
Binary classification model |
Base Model |
microsoft/deberta - v3 - base |
Training Data |
Jinyan1/COLING_2025_MGT_en |
Metrics |
f1 |
License |
MIT |