🚀 文本改寫模型
本項目基於T5-base模型,通過遷移學習使模型能夠像ChatGPT一樣生成高質量的改寫文本,是Hugging Face上優秀的文本改寫模型之一。
🚀 快速開始
本模型在 ChatGPT釋義數據集 上進行訓練。該數據集基於 Quora釋義問題、SQUAD 2.0 以及 CNN新聞數據集 構建。
部署示例
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")
model = AutoModelForSeq2SeqLM.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base").to(device)
def paraphrase(
question,
num_beams=5,
num_beam_groups=5,
num_return_sequences=5,
repetition_penalty=10.0,
diversity_penalty=3.0,
no_repeat_ngram_size=2,
temperature=0.7,
max_length=128
):
input_ids = tokenizer(
f'paraphrase: {question}',
return_tensors="pt", padding="longest",
max_length=max_length,
truncation=True,
).input_ids.to(device)
outputs = model.generate(
input_ids, temperature=temperature, repetition_penalty=repetition_penalty,
num_return_sequences=num_return_sequences, no_repeat_ngram_size=no_repeat_ngram_size,
num_beams=num_beams, num_beam_groups=num_beam_groups,
max_length=max_length, diversity_penalty=diversity_penalty
)
res = tokenizer.batch_decode(outputs, skip_special_tokens=True)
return res
💻 使用示例
基礎用法
text = 'What are the best places to see in New York?'
paraphrase(text)
['What are some must-see places in New York?',
'Can you suggest some must-see spots in New York?',
'Where should one go to experience the best NYC has to offer?',
'Which places should I visit in New York?',
'What are the top destinations to explore in New York?']
高級用法
text = "Rammstein's album Mutter was recorded in the south of France in May and June 2000, and mixed in Stockholm in October of that year."
paraphrase(text)
['In May and June 2000, Rammstein travelled to the south of France to record his album Mutter, which was mixed in Stockholm in October of that year.',
'The album Mutter by Rammstein was recorded in the south of France during May and June 2000, with mixing taking place in Stockholm in October of that year.',
'The album Mutter by Rammstein was recorded in the south of France during May and June 2000, with mixing taking place in Stockholm in October of that year. It',
'Mutter, the album released by Rammstein, was recorded in southern France during May and June 2000, with mixing taking place between October and September.',
'In May and June 2000, Rammstein recorded his album Mutter in the south of France, with the mix being made at Stockholm during October.']
🔧 技術細節
訓練參數
epochs = 5
batch_size = 64
max_length = 128
lr = 5e-5
batches_qty = 196465
betas = (0.9, 0.999)
eps = 1e-08
BibTeX引用
@inproceedings{chatgpt_paraphraser,
author={Vladimir Vorobev, Maxim Kuznetsov},
title={A paraphrasing model based on ChatGPT paraphrases},
year={2023}
}
📄 許可證
本項目採用OpenRail許可證。
相關鏈接
📦 模型信息
屬性 |
詳情 |
模型類型 |
基於T5-base的文本改寫模型 |
訓練數據 |
基於ChatGPT釋義數據集,該數據集基於Quora釋義問題、SQUAD 2.0以及CNN新聞數據集構建 |
推理參數 |
束搜索數量:5;束搜索組數量:5;返回序列數量:5;重複懲罰:10.01;多樣性懲罰:3.01;無重複n-gram大小:2;溫度:0.7;最大長度:128 |