🚀 文本改写模型
本项目基于T5-base模型,通过迁移学习使模型能够像ChatGPT一样生成高质量的改写文本,是Hugging Face上优秀的文本改写模型之一。
🚀 快速开始
本模型在 ChatGPT释义数据集 上进行训练。该数据集基于 Quora释义问题、SQUAD 2.0 以及 CNN新闻数据集 构建。
部署示例
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")
model = AutoModelForSeq2SeqLM.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base").to(device)
def paraphrase(
question,
num_beams=5,
num_beam_groups=5,
num_return_sequences=5,
repetition_penalty=10.0,
diversity_penalty=3.0,
no_repeat_ngram_size=2,
temperature=0.7,
max_length=128
):
input_ids = tokenizer(
f'paraphrase: {question}',
return_tensors="pt", padding="longest",
max_length=max_length,
truncation=True,
).input_ids.to(device)
outputs = model.generate(
input_ids, temperature=temperature, repetition_penalty=repetition_penalty,
num_return_sequences=num_return_sequences, no_repeat_ngram_size=no_repeat_ngram_size,
num_beams=num_beams, num_beam_groups=num_beam_groups,
max_length=max_length, diversity_penalty=diversity_penalty
)
res = tokenizer.batch_decode(outputs, skip_special_tokens=True)
return res
💻 使用示例
基础用法
text = 'What are the best places to see in New York?'
paraphrase(text)
['What are some must-see places in New York?',
'Can you suggest some must-see spots in New York?',
'Where should one go to experience the best NYC has to offer?',
'Which places should I visit in New York?',
'What are the top destinations to explore in New York?']
高级用法
text = "Rammstein's album Mutter was recorded in the south of France in May and June 2000, and mixed in Stockholm in October of that year."
paraphrase(text)
['In May and June 2000, Rammstein travelled to the south of France to record his album Mutter, which was mixed in Stockholm in October of that year.',
'The album Mutter by Rammstein was recorded in the south of France during May and June 2000, with mixing taking place in Stockholm in October of that year.',
'The album Mutter by Rammstein was recorded in the south of France during May and June 2000, with mixing taking place in Stockholm in October of that year. It',
'Mutter, the album released by Rammstein, was recorded in southern France during May and June 2000, with mixing taking place between October and September.',
'In May and June 2000, Rammstein recorded his album Mutter in the south of France, with the mix being made at Stockholm during October.']
🔧 技术细节
训练参数
epochs = 5
batch_size = 64
max_length = 128
lr = 5e-5
batches_qty = 196465
betas = (0.9, 0.999)
eps = 1e-08
BibTeX引用
@inproceedings{chatgpt_paraphraser,
author={Vladimir Vorobev, Maxim Kuznetsov},
title={A paraphrasing model based on ChatGPT paraphrases},
year={2023}
}
📄 许可证
本项目采用OpenRail许可证。
相关链接
📦 模型信息
属性 |
详情 |
模型类型 |
基于T5-base的文本改写模型 |
训练数据 |
基于ChatGPT释义数据集,该数据集基于Quora释义问题、SQUAD 2.0以及CNN新闻数据集构建 |
推理参数 |
束搜索数量:5;束搜索组数量:5;返回序列数量:5;重复惩罚:10.01;多样性惩罚:3.01;无重复n-gram大小:2;温度:0.7;最大长度:128 |