Text-Rewriter-Paraphraser开源文本改写模型 - 高效改写，保留原意变换句式

首页

Text Rewriter Paraphraser

由 Ateeqq 开发

基于T5-Base微调的文本复述生成模型，能够高效改写文本内容，保持原意的同时改变句式结构。

文本生成

Transformers

开源协议:Openrail #高质量复述 #句式重构 #AI检测规避

下载量 1,613

发布时间 : 6/2/2024

模型简介

该模型主要用于文本复述任务，能够生成与原文语义相同但表达方式不同的文本，适用于内容改写、规避AI检测等场景。

模型特点

基于T5-Base微调

依托预训练文本转换模型的强大能力实现高效复述

海量训练数据

整合三个开源数据集并经多维度清洗优化，共43万例训练样本

高质量复述输出

在保持准确性和事实正确性的前提下显著改变句式结构

规避AI检测

生成结果自然流畅，与人工撰写文本难以区分

模型能力

文本改写

句式转换

内容复述

文本风格转换

使用案例

教育

课程描述改写

改写课程描述文本，生成多种表达方式

AWS课程描述的多版本改写

医疗

医疗AI应用描述

生成医疗领域AI应用的多种描述方式

生成式AI在医疗领域的多角度描述

技术

技术概念解释

对技术概念进行多版本解释

模型微调技术的多种解释方式

🚀 文本改写释义器

本仓库包含一个基于T5-Base的微调文本改写模型，该模型拥有2.23亿个参数。此模型能够有效改写文本，为用户提供高质量的释义内容。

✨ 主要特性

基于T5-Base微调：借助预训练的文本到文本转换模型的强大能力，实现高效的释义功能。
大型数据集（43万个示例）：在综合数据集上进行训练，该数据集整合了三个开源数据源，并采用多种技术进行清理，以确保最佳性能。
高质量释义：生成的释义能够显著改变句子结构，同时保持准确性和事实正确性。
不易被AI检测：旨在生成自然的释义，使其与人类撰写的文本难以区分。

模型性能：

训练损失：1.0645
验证损失：0.8761

📦 安装指南

文档未提及具体安装步骤，暂不展示。

💻 使用示例

基础用法

T5模型需要一个与任务相关的前缀，因为这是一个释义任务，我们将添加前缀 "paraphraser: "。

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser")
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser").to(device)

def generate_title(text):
    input_ids = tokenizer(f'paraphraser: {text}', return_tensors="pt", padding="longest", truncation=True, max_length=64).input_ids.to(device)
    outputs = model.generate(
        input_ids,
        num_beams=4,
        num_beam_groups=4,
        num_return_sequences=4,
        repetition_penalty=10.0,
        diversity_penalty=3.0,
        no_repeat_ngram_size=2,
        temperature=0.8,
        max_length=64
    )
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)

text = 'By leveraging prior model training through transfer learning, fine-tuning can reduce the amount of expensive computing power and labeled data needed to obtain large models tailored to niche use cases and business needs.'
generate_title(text)

输出示例

 ['The fine-tuning can reduce the amount of expensive computing power and labeled data required to obtain large models adapted for niche use cases and business needs by using prior model training through transfer learning.',
 'fine-tuning, by utilizing prior model training through transfer learning, can reduce the amount of expensive computing power and labeled data required to obtain large models tailored for niche use cases and business needs.',
 'Fine-tunering by using prior model training through transfer learning can reduce the amount of expensive computing power and labeled data required to obtain large models adapted for niche use cases and business needs.',
 'Using transfer learning to use prior model training, fine-tuning can reduce the amount of expensive computing power and labeled data required for large models that are suitable in niche usage cases or businesses.']

📚 详细文档

推理参数

属性	详情
束搜索数量 (`num_beams`)	3
束搜索组数量 (`num_beam_groups`)	3
返回序列数量 (`num_return_sequences`)	1
重复惩罚 (`repetition_penalty`)	3
多样性惩罚 (`diversity_penalty`)	3.01
无重复n-gram大小 (`no_repeat_ngram_size`)	2
温度 (`temperature`)	0.8
最大长度 (`max_length`)	64

示例文本

示例标题	文本内容
AWS课程	paraphraser: Learn to build generative AI applications with an expert AWS instructor with the 2-day Developing Generative AI Applications on AWS course.
生成式AI	paraphraser: In healthcare, Generative AI can help generate synthetic medical data to train machine learning models, develop new drug candidates, and design clinical trials.
微调	paraphraser: By leveraging prior model training through transfer learning, fine-tuning can reduce the amount of expensive computing power and labeled data needed to obtain large models tailored to niche use cases and business needs.