🚀 Google's T5 Version 1.1
Google's T5 Version 1.1 is an enhanced version of the original T5 model, bringing multiple improvements in architecture and training methods, which can better adapt to various natural language processing tasks.
🚀 Quick Start
This README provides an in - depth introduction to Google's T5 Version 1.1, including its improvements, pre - training information, and related research papers.
✨ Features
Improvements in Version 1.1
- Activation Function: T5 Version 1.1 uses GEGLU activation in the feed - forward hidden layer instead of ReLU. Refer to here for more details.
- Dropout: Dropout was turned off during pre - training (leading to quality improvement). It should be re - enabled during fine - tuning.
- Pre - training Data: It was pre - trained only on C4 without mixing in downstream tasks.
- Parameter Sharing: There is no parameter sharing between the embedding and classifier layers.
- Model Size: "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different, with a larger
d_model
and smaller num_heads
and d_ff
.
Note: T5 Version 1.1 was only pre - trained on C4 without any supervised training. So, this model needs to be fine - tuned before it can be used on a downstream task.
📚 Documentation
Pretraining Dataset
The model was pre - trained on the C4 dataset.
Other Community Checkpoints
You can find other community checkpoints [here](https://huggingface.co/models?search=t5 - v1_1).
Paper
The related research paper is Exploring the Limits of Transfer Learning with a Unified Text - to - Text Transformer.
Authors
The authors of the paper are Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
Abstract
Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Our systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre - trained models, and code.
Model Image

📄 License
This project is licensed under the Apache 2.0 license.