đ Italian T5 Base (Oscar) đŽđš
This repository contains the model formerly known as gsarti/t5-base-it
.
The IT5 model family is the first attempt to pretrain large - scale sequence - to - sequence transformer models for the Italian language, following the approach of the original T5 model.
This model is part of the project "IT5: Large-Scale Text-to-Text Pretraining for Italian Language Understanding and Generation" (to be released), by Gabriele Sarti with the support of Huggingface and TPU usage sponsored by Google's TPU Research Cloud. All training was done on a single TPU3v8 - VM machine on Google Cloud. Check the Tensorboard tab of the repository for an overview of the training process.
The inference widget is deactivated because the model needs task - specific seq2seq fine - tuning on a downstream task to be practical. The model gsarti/it5-base-nli
shows an example of this model fine - tuned on a downstream NLI task.
⨠Features
Model variants
This repository has the checkpoints for a base
version of the model trained on the OSCAR corpus using đ¤ Datasets. The original t5 - base
model configuration was used, except the dropout_rate
parameter, which was set at 0
instead of 0.1
during pre - training, following the implementation of t5 - v1.1
. The tokenizer is a SentencePieceUnigramTokenizer
trained on the first 2M sentences of the Italian portion of the mC4
corpus. An improved version of the model trained on the Thoroughly Cleaned Italian mC4 Corpus (~41B words, ~275GB) is also available as gsarti/it5-base
. The training procedure is available on Github.
The following table summarizes the parameters for all available models:
Property |
it5-small |
it5-base |
it5-large |
it5-base-oscar (this one) |
dataset |
gsarti/clean_mc4_it |
gsarti/clean_mc4_it |
gsarti/clean_mc4_it |
oscar/unshuffled_deduplicated_it |
architecture |
google/t5-v1_1-small |
google/t5-v1_1-base |
google/t5-v1_1-large |
t5-base |
learning rate |
5e - 3 |
5e - 3 |
5e - 3 |
1e - 2 |
steps |
1,050,000 |
1,050,000 |
2,100,000 |
258,000 |
training time |
36 hours |
101 hours |
370 hours |
98 hours |
ff projection |
gated-gelu |
gated-gelu |
gated-gelu |
relu |
tie embeds |
false |
false |
false |
true |
optimizer |
adafactor |
adafactor |
adafactor |
adafactor |
max seq. length |
512 |
512 |
512 |
512 |
per - device batch size |
16 |
16 |
8 |
16 |
tot. batch size |
128 |
128 |
64 |
128 |
weigth decay |
1e - 3 |
1e - 3 |
1e - 2 |
1e - 3 |
validation split size |
15K examples |
15K examples |
15K examples |
15K examples |
The high training time of it5-base-oscar
was due to a bug in the training script.
For a list of individual model parameters, refer to the config.json
file in the respective repositories.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("gsarti/it5-base-oscar")
model = T5ForConditionalGeneration.from_pretrained("gsarti/it5-base-oscar")
Note: You will need to fine - tune the model on your downstream seq2seq task to use it. See an example here.
Advanced Usage
from transformers import FlaxT5ForConditionalGeneration, TFT5ForConditionalGeneration
model_flax = FlaxT5ForConditionalGeneration.from_pretrained("gsarti/it5-base-oscar")
model_tf = TFT5ForConditionalGeneration.from_pretrained("gsarti/it5-base-oscar")
đ Documentation
Limitations
Due to the nature of the web - scraped corpus on which IT5 models were trained, it is likely that their usage could reproduce and amplify pre - existing biases in the data, resulting in potentially harmful content such as racial or gender stereotypes and conspiracist views. For this reason, the study of such biases is explicitly encouraged, and model usage should ideally be restricted to research - oriented and non - user - facing endeavors.
Model curators
For problems or updates on this model, please contact gabriele.sarti996@gmail.com.
đ License
This model is released under the apache - 2.0
license.
đ§ Technical Details
No specific technical details (more than 50 words of specific technical explanations) are provided in the original document, so this section is skipped.
đ Citation Information
@article{sarti-nissim-2022-it5,
title={IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
author={Sarti, Gabriele and Nissim, Malvina},
journal={ArXiv preprint 2203.03759},
url={https://arxiv.org/abs/2203.03759},
year={2022},
month={mar}
}