🚀 [Google's T5 Version 1.1]
Google's T5 Version 1.1 is an improved model based on the original T5, offering enhanced performance and new features in natural language processing.
🚀 Quick Start
This README provides detailed information about Google's T5 Version 1.1, including its improvements, pre - training details, and related research.
✨ Features
- Activation Function Upgrade: T5 Version 1.1 uses GEGLU activation in the feed - forward hidden layer instead of ReLU. Refer to here for more details.
- Dropout Adjustment: Dropout was turned off during pre - training, which led to quality improvement. It should be re - enabled during fine - tuning.
- Pre - training Strategy: It was pre - trained only on C4 without mixing in downstream tasks.
- Parameter Non - sharing: There is no parameter sharing between the embedding and classifier layer.
- Model Naming and Shape Changes: "xl" and "xxl" replace "3B" and "11B". The model shapes have differences, with a larger
d_model
and smaller num_heads
and d_ff
.
Note: T5 Version 1.1 was only pre - trained on C4 without any supervised training. So, it needs to be fine - tuned before being used on a downstream task.
📚 Documentation
Pretraining Dataset
The model was pre - trained on the C4 dataset.
Other Community Checkpoints
You can find other community checkpoints here.
Paper
The related research paper is Exploring the Limits of Transfer Learning with a Unified Text - to - Text Transformer.
Authors
The authors of the paper are Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
Abstract
Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, the researchers explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Their systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from their exploration with scale and their new “Colossal Clean Crawled Corpus”, they achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, they release their dataset, pre - trained models, and code.
Model Image

📄 License
This project is licensed under the Apache-2.0 license.