🚀 Google's T5 Version 1.1
Google's T5 Version 1.1 is an enhanced version of the original T5 model, with multiple improvements in architecture, pre - training, and parameter settings, offering better performance in natural language processing tasks.
🚀 Quick Start
This README provides detailed information about Google's T5 Version 1.1, including its improvements over the original model, pre - training datasets, and relevant research papers.
✨ Features
Improvements in Version 1.1
- Activation Function: In the feed - forward hidden layer, it uses GEGLU activation instead of ReLU. See here for more details.
- Dropout Setting: Dropout was turned off during pre - training, which led to quality improvement. However, it should be re - enabled during fine - tuning.
- Pre - training Data: It was pre - trained only on C4 without mixing in downstream tasks.
- Parameter Sharing: There is no parameter sharing between the embedding and classifier layer.
- Model Sizes: "xl" and "xxl" replace "3B" and "11B". The model shapes are different, with a larger
d_model
and smaller num_heads
and d_ff
.
Note: T5 Version 1.1 was only pre - trained on C4 without any supervised training. So, it needs to be fine - tuned before being used on a downstream task.
📚 Documentation
Pretraining Dataset
The model was pre - trained on the C4 dataset.
Other Community Checkpoints
You can find other community checkpoints [here](https://huggingface.co/models?search=t5 - v1_1).
Paper
The related research paper is Exploring the Limits of Transfer Learning with a Unified Text - to - Text Transformer.
Authors
The authors of the paper are Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
Abstract
Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Our systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre - trained models, and code.

📄 License
This project is licensed under the Apache 2.0 license.