🚀 用于机器释义检测的Longformer-base模型
本模型用于检测机器释义的抄袭情况,能有效利用预训练的Longformer-base模型,在相关数据集上进行训练,为学术诚信保驾护航,提升抄袭检测的准确性。
🚀 快速开始
模型加载与使用示例
from transformers import AutoModelForSequenceClassification, AutoTokenizer
AutoModelForSequenceClassification("jpelhaw/longformer-base-plagiarism-detection")
AutoTokenizer.from_pretrained("jpelhaw/longformer-base-plagiarism-detection")
input = "Plagiarism is the representation of another author's writing, \
thoughts, ideas, or expressions as one's own work."
example = tokenizer.tokenize(input, add_special_tokens=True)
answer = model(**example)
📚 详细文档
引用信息
如果您在研究工作中使用此模型,请引用以下文献:
@InProceedings{10.1007/978-3-030-96957-8_34,
author="Wahle, Jan Philip and Ruas, Terry and Folt{\'y}nek, Tom{\'a}{\v{s}} and Meuschke, Norman and Gipp, Bela",
title="Identifying Machine-Paraphrased Plagiarism",
booktitle="Information for a Better World: Shaping the Global Future",
year="2022",
publisher="Springer International Publishing",
address="Cham",
pages="393--413",
abstract="Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models. We analyze preprints of research papers, graduation theses, and Wikipedia articles, which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best performing technique, Longformer, achieved an average F1 score of 80.99{\%} (F1=99.68{\%} for SpinBot and F1=71.64{\%} for SpinnerChief cases), while human evaluators achieved F1=78.4{\%} for SpinBot and F1=65.6{\%} for SpinnerChief cases. We show that the automated classification alleviates shortcomings of widely-used text-matching systems, such as Turnitin and PlagScan.",
isbn="978-3-030-96957-8"
}
额外信息
- 此模型是Longformer-base在机器释义抄袭数据集上训练后的检查点。
- 更多关于此模型的信息:
📄 许可证
文档中未提及许可证相关信息。
🔍 其他信息
- 缩略图:用于社交分享的缩略图链接为
url to a thumbnail used in social sharing
- 标签:array、of、tags
- 数据集:jpwahle/machine-paraphrase-dataset
- 小部件示例文本:Plagiarism is the representation of another author's writing, thoughts, ideas, or expressions as one's own work.