ChatGPT Paraphraser on T5 Base: An Open-Source Text Paraphrasing Model - Generate High-Quality Paraphrased Texts for Free

Chatgpt Paraphraser On T5 Base

Developed by humarin

A text paraphrasing model trained on the T5-base architecture, capable of generating high-quality paraphrased text, claimed to be one of the best paraphrasing models on the Hugging Face platform

Text Generation

Transformers

EnglishOpen Source License:Openrail #Multi-sentence paraphrasing #High-diversity generation #T5 architecture optimization

Downloads 115.08k

Release Time : 3/17/2023

Model Overview

This model mimics ChatGPT's paraphrasing capabilities through transfer learning, integrating training data from Quora, SQUAD 2.0, and CNN News datasets, primarily used for text rewriting and paraphrasing tasks

Model Features

Multi-source dataset training

Integrates three high-quality datasets: Quora paraphrasing questions, SQUAD 2.0, and CNN News

Advanced generation control

Supports advanced text generation parameter controls such as beam search and diversity penalty

High-quality paraphrasing

Mimics ChatGPT's paraphrasing ability through transfer learning, generating semantically consistent diverse expressions

Model Capabilities

Text paraphrasing

Semantic-preserving rewriting

Diverse expression generation

Use Cases

Content creation

Travel guide rewriting

Diversifies expressions for tourist attraction descriptions

Generates 5 different versions of attraction introductions

News summary rewriting

Non-repetitive paraphrasing of news content

Multiple expression versions maintaining original meaning

Educational assistance

Learning material diversification

Generates different versions of explanations for the same knowledge point

Helps students understand concepts from multiple perspectives

🚀 ChatGPT Paraphraser on T5-base

This project offers a high - quality paraphrasing model trained on diverse datasets, aiming to generate paraphrases as well as ChatGPT.

🚀 Quick Start

This model was trained on our ChatGPT paraphrase dataset. The dataset combines elements from the Quora paraphrase question, texts from the SQUAD 2.0, and the CNN news dataset.

Based on the T5 - base model, "transfer learning" was employed to enable the model to generate paraphrases comparable to ChatGPT. It stands as one of the best paraphrasing models on Hugging Face.

✨ Features

High - Quality Paraphrasing: Capable of generating paraphrases similar to ChatGPT.
Diverse Training Data: Trained on a wide range of datasets for better generalization.
Based on T5 - base: Utilizes the power of the T5 - base model with transfer learning.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

device = "cuda"

tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")

model = AutoModelForSeq2SeqLM.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base").to(device)

def paraphrase(
    question,
    num_beams=5,
    num_beam_groups=5,
    num_return_sequences=5,
    repetition_penalty=10.0,
    diversity_penalty=3.0,
    no_repeat_ngram_size=2,
    temperature=0.7,
    max_length=128
):
    input_ids = tokenizer(
        f'paraphrase: {question}',
        return_tensors="pt", padding="longest",
        max_length=max_length,
        truncation=True,
    ).input_ids.to(device)
    
    outputs = model.generate(
        input_ids, temperature=temperature, repetition_penalty=repetition_penalty,
        num_return_sequences=num_return_sequences, no_repeat_ngram_size=no_repeat_ngram_size,
        num_beams=num_beams, num_beam_groups=num_beam_groups,
        max_length=max_length, diversity_penalty=diversity_penalty
    )

    res = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    return res

Advanced Usage

Here are some actual input - output examples:

Input:

text = 'What are the best places to see in New York?'
paraphrase(text)

Output:

['What are some must-see places in New York?',
 'Can you suggest some must-see spots in New York?',
 'Where should one go to experience the best NYC has to offer?',
 'Which places should I visit in New York?',
 'What are the top destinations to explore in New York?']

Input:

text = "Rammstein's album Mutter was recorded in the south of France in May and June 2000, and mixed in Stockholm in October of that year."
paraphrase(text)

Output:

['In May and June 2000, Rammstein travelled to the south of France to record his album Mutter, which was mixed in Stockholm in October of that year.',
 'The album Mutter by Rammstein was recorded in the south of France during May and June 2000, with mixing taking place in Stockholm in October of that year.',
 'The album Mutter by Rammstein was recorded in the south of France during May and June 2000, with mixing taking place in Stockholm in October of that year. It',
 'Mutter, the album released by Rammstein, was recorded in southern France during May and June 2000, with mixing taking place between October and September.',
 'In May and June 2000, Rammstein recorded his album Mutter in the south of France, with the mix being made at Stockholm during October.']

🔧 Technical Details

Train parameters

epochs = 5
batch_size = 64
max_length = 128
lr = 5e-5
batches_qty = 196465
betas = (0.9, 0.999)
eps = 1e-08

Inference parameters

Property	Details
num_beams	5
num_beam_groups	5
num_return_sequences	5
repetition_penalty	10.01
diversity_penalty	3.01
no_repeat_ngram_size	2
temperature	0.7
max_length	128

BibTeX entry and citation info

@inproceedings{chatgpt_paraphraser,
  author={Vladimir Vorobev, Maxim Kuznetsov},
  title={A paraphrasing model based on ChatGPT paraphrases},
  year={2023}
}

📄 License

The license for this project is OpenRail.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご