mt5-base-it-paraphraser Open-source Model - Generate Italian sentence paraphrasing for free

Mt5 Base It Paraphraser

Developed by aiknowyou

This is an Italian paraphrasing generation model based on the MT5-base architecture, fine-tuned on the TaPaCo and STS Benchmark datasets.

Text Generation

Transformers

Other#Italian paraphrasing #Multi-sentence generation #Text rewriting

Downloads 55

Release Time : 8/26/2022

Model Overview

This model is specifically designed for generating paraphrases of Italian text, capable of rewriting input sentences into semantically similar but differently expressed versions.

Model Features

Multilingual support

Based on the MT5 architecture, supporting 101 languages, with a focus on Italian paraphrasing tasks.

High-quality paraphrasing

Fine-tuned on TaPaCo and STS Benchmark datasets to generate semantically preserved and diverse paraphrases.

Flexible configuration

Supports adjusting parameters like beam size and n-gram repetition limits to control generation quality.

Model Capabilities

Text paraphrase generation

Semantic-preserving rewriting

Diverse expression generation

Use Cases

Text processing

Content rewriting

Generating semantically equivalent but differently expressed versions of original content

Can produce over 5 different paraphrased versions with varied expressions

Data augmentation

Generating variants of training data for NLP tasks

Increases training data diversity without altering semantics

Educational applications

Language learning

Demonstrating different expressions of the same meaning for language learners

Helps students master richer expressions

🚀 MT5-base fine-tuned on Tapaco and STS Benchmark datasets for Paraphrasing

An Italian paraphraser based on MT5-base, fine-tuned on TaPaCo and STS Benchmark datasets to generate paraphrases effectively.

🚀 Quick Start

The MT5 model was introduced in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel in 2020. Here is the abstract:

The recent "Text-to-Text Transfer Transformer" (T5) utilized a unified text-to-text format and scale to achieve state-of-the-art results on a wide range of English-language NLP tasks. In this paper, the authors introduce mT5, a multilingual variant of T5 pre-trained on a new Common Crawl-based dataset covering 101 languages. They detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. They also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

✨ Features

Model fine-tuning

The training script is a slightly modified version of this Colab notebook after preparing an adapted Italian version of the mt5 model by following this other Colab notebook

💻 Usage Examples

Basic Usage

from transformers import T5ForConditionalGeneration, T5Tokenizer
import torch

raw_model = 'aiknowyou/mt5-base-it-paraphraser'

# Model and Tokenizer definition #
model = T5ForConditionalGeneration.from_pretrained(raw_model)
tokenizer = T5Tokenizer.from_pretrained(raw_model)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
max_size = 10000

def paraphrase(text, beams=100, grams=10, num_return_sequences=5):
    x = tokenizer(text, return_tensors='pt', padding=True).to(model.device)
    max_size = int(x.input_ids.shape[1] * 1.5 + 10)
    out = model.generate(**x, encoder_no_repeat_ngram_size=grams, num_beams=beams, num_return_sequences=num_return_sequences, max_length=max_size)
    return tokenizer.batch_decode(out, skip_special_tokens=True)
  
sentence = "Due amici si incontrano al bar per discutere del modo migliore di generare parafrasi."
print(paraphrase(sentence))

Output

Original Question ::
"Due amici si incontrano al bar per discutere del modo migliore di generare parafrasi."

Paraphrased Questions :: 
'Due amici stanno discutendo del modo migliore per generare parafrasi.', 
'Due amici si incontrano a un bar per discutere del modo migliore per generare parafrasi.', 
'Due amici si incontrano al bar per parlare del modo migliore per generare parafrasi.', 
'Due amici sono seduti al bar per discutere del modo migliore per generare parafrasi.', 
'Due amici si incontrano in un bar per discutere del modo migliore per generare parafrasi.'

📚 Documentation

Contribution

Thanks to @tradicio for adding this model.

📄 License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

📦 Additional Information

Property	Details
Datasets	tapaco, stsb_multi_mt
Model Type	mt5
Task	paraphrase-generation, paraphrasing
License	cc-by-nc-sa-4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご