🚀 用於機器釋義檢測的Longformer-base模型
本模型用於檢測機器釋義的抄襲情況,能有效利用預訓練的Longformer-base模型,在相關數據集上進行訓練,為學術誠信保駕護航,提升抄襲檢測的準確性。
🚀 快速開始
模型加載與使用示例
from transformers import AutoModelForSequenceClassification, AutoTokenizer
AutoModelForSequenceClassification("jpelhaw/longformer-base-plagiarism-detection")
AutoTokenizer.from_pretrained("jpelhaw/longformer-base-plagiarism-detection")
input = "Plagiarism is the representation of another author's writing, \
thoughts, ideas, or expressions as one's own work."
example = tokenizer.tokenize(input, add_special_tokens=True)
answer = model(**example)
📚 詳細文檔
引用信息
如果您在研究工作中使用此模型,請引用以下文獻:
@InProceedings{10.1007/978-3-030-96957-8_34,
author="Wahle, Jan Philip and Ruas, Terry and Folt{\'y}nek, Tom{\'a}{\v{s}} and Meuschke, Norman and Gipp, Bela",
title="Identifying Machine-Paraphrased Plagiarism",
booktitle="Information for a Better World: Shaping the Global Future",
year="2022",
publisher="Springer International Publishing",
address="Cham",
pages="393--413",
abstract="Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models. We analyze preprints of research papers, graduation theses, and Wikipedia articles, which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best performing technique, Longformer, achieved an average F1 score of 80.99{\%} (F1=99.68{\%} for SpinBot and F1=71.64{\%} for SpinnerChief cases), while human evaluators achieved F1=78.4{\%} for SpinBot and F1=65.6{\%} for SpinnerChief cases. We show that the automated classification alleviates shortcomings of widely-used text-matching systems, such as Turnitin and PlagScan.",
isbn="978-3-030-96957-8"
}
額外信息
- 此模型是Longformer-base在機器釋義抄襲數據集上訓練後的檢查點。
- 更多關於此模型的信息:
📄 許可證
文檔中未提及許可證相關信息。
🔍 其他信息
- 縮略圖:用於社交分享的縮略圖鏈接為
url to a thumbnail used in social sharing
- 標籤:array、of、tags
- 數據集:jpwahle/machine-paraphrase-dataset
- 小部件示例文本:Plagiarism is the representation of another author's writing, thoughts, ideas, or expressions as one's own work.