T5-v1_1-xl Open-source Text Conversion Model - Freely Achieve Powerful Text-to-Text Conversion Functions

T5 V1 1 Xl

Developed by google

T5 1.1 is Google's improved text-to-text transfer Transformer model, utilizing GEGLU activation function and optimized architecture, pretrained solely on the C4 dataset in an unsupervised manner

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Text-to-Text Unified Framework #GEGLU Activation Function #Unsupervised Pretraining

Downloads 30.17k

Release Time : 3/2/2022

Model Overview

Enhanced T5 model that unifies NLP tasks into a text-to-text format, requiring fine-tuning for downstream tasks before use

Model Features

GEGLU Activation Function

Uses GEGLU instead of ReLU in feed-forward hidden layers to improve model performance

Pure Unsupervised Pretraining

Pretrained exclusively on the C4 dataset without mixing downstream task data

Parameter Sharing Optimization

Removes parameter sharing between embedding and classifier layers

Architecture Adjustments

Increases d_model dimension while reducing attention heads and feed-forward layer dimensions

Model Capabilities

Text generation

Text classification

Question answering

Summarization

Machine translation

Use Cases

Text Generation

Content Creation

Generates coherent texts such as articles and stories

Text Understanding

Sentiment Analysis

Analyzes textual sentiment orientation

Information Extraction

Question Answering

Answers questions based on given text

🚀 Google's T5 Version 1.1

Google's T5 Version 1.1 is an enhanced version of the original T5 model, with multiple improvements in architecture, pre - training, and parameter settings, offering better performance in natural language processing tasks.

🚀 Quick Start

This README provides detailed information about Google's T5 Version 1.1, including its improvements over the original model, pre - training datasets, and relevant research papers.

✨ Features

Improvements in Version 1.1

Activation Function: In the feed - forward hidden layer, it uses GEGLU activation instead of ReLU. See here for more details.
Dropout Setting: Dropout was turned off during pre - training, which led to quality improvement. However, it should be re - enabled during fine - tuning.
Pre - training Data: It was pre - trained only on C4 without mixing in downstream tasks.
Parameter Sharing: There is no parameter sharing between the embedding and classifier layer.
Model Sizes: "xl" and "xxl" replace "3B" and "11B". The model shapes are different, with a larger d_model and smaller num_heads and d_ff.

Note: T5 Version 1.1 was only pre - trained on C4 without any supervised training. So, it needs to be fine - tuned before being used on a downstream task.

📚 Documentation

Pretraining Dataset

The model was pre - trained on the C4 dataset.

Other Community Checkpoints

You can find other community checkpoints [here](https://huggingface.co/models?search=t5 - v1_1).

Paper

Authors

The authors of the paper are Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

Abstract

Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Our systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre - trained models, and code.

model image

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご