T5-v1_1-large Open-source Text Transformation Model - Optimized Architecture for Unsupervised Pretraining

T5 V1 1 Large

Developed by google

T5 1.1 is Google's improved text-to-text transfer model, utilizing GEGLU activation function and optimized architecture, focusing on unsupervised pre-training

Large Language Model EnglishOpen Source License:Apache-2.0 #Text-to-Text Unified Architecture #GEGLU Activation Function #Unsupervised Pre-training

Downloads 111.29k

Release Time : 3/2/2022

Model Overview

A unified text-to-text Transformer framework that achieves various NLP tasks through transfer learning, requiring fine-tuning for downstream tasks

Model Features

GEGLU Activation Function

Uses GEGLU instead of ReLU in feed-forward networks to enhance model expressiveness

Unsupervised Pre-training

Purely unsupervised pre-training on C4 dataset to avoid task data contamination

Parameter Sharing Optimization

Removes parameter sharing between embedding and classifier layers to improve model flexibility

Architecture Adjustment

Increases model dimension while reducing attention heads to balance computational efficiency and performance

Model Capabilities

Text generation

Text classification

Q&A systems

Summarization

Machine translation (requires fine-tuning)

Use Cases

Text generation

Content creation assistance

Generating article drafts or continuing texts

Requires fine-tuning for evaluation

Information extraction

Q&A systems

Building open-domain Q&A bots

Excellent performance on benchmarks like SQuAD

🚀 [Google's T5 Version 1.1]

Google's T5 Version 1.1 is an improved model based on the original T5, offering enhanced performance and new features in natural language processing.

🚀 Quick Start

This README provides detailed information about Google's T5 Version 1.1, including its improvements, pre - training details, and related research.

✨ Features

Activation Function Upgrade: T5 Version 1.1 uses GEGLU activation in the feed - forward hidden layer instead of ReLU. Refer to here for more details.
Dropout Adjustment: Dropout was turned off during pre - training, which led to quality improvement. It should be re - enabled during fine - tuning.
Pre - training Strategy: It was pre - trained only on C4 without mixing in downstream tasks.
Parameter Non - sharing: There is no parameter sharing between the embedding and classifier layer.
Model Naming and Shape Changes: "xl" and "xxl" replace "3B" and "11B". The model shapes have differences, with a larger d_model and smaller num_heads and d_ff.

Note: T5 Version 1.1 was only pre - trained on C4 without any supervised training. So, it needs to be fine - tuned before being used on a downstream task.

📚 Documentation

Pretraining Dataset

The model was pre - trained on the C4 dataset.

Other Community Checkpoints

You can find other community checkpoints here.

Paper

Authors

The authors of the paper are Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

Abstract

Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, the researchers explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Their systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from their exploration with scale and their new “Colossal Clean Crawled Corpus”, they achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, they release their dataset, pre - trained models, and code.

Model Image

model image

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご