T5-v1_1-base Open Source Text Conversion Model - Achieve Efficient Text-to-Text Conversion for Free

T5 V1 1 Base

Developed by google

T5 1.1 is Google's improved text-to-text transfer model, utilizing the GEGLU activation function and optimized architecture, focused on unsupervised pretraining

Large Language Model EnglishOpen Source License:Apache-2.0 #Text-to-Text Unified Framework #GEGLU Activation Function #Unsupervised Pretraining

Downloads 150.73k

Release Time : 3/2/2022

Model Overview

Enhanced T5 model with improved transfer learning performance through architectural optimizations, requires fine-tuning for downstream NLP tasks

Model Features

GEGLU Activation Function

Uses GEGLU instead of ReLU in feed-forward hidden layers to enhance model expressiveness

Pure Unsupervised Pretraining

Pretrained exclusively on C4 dataset without mixing downstream task data

Parameter Sharing Optimization

Removes parameter sharing between embedding and classifier layers for improved model flexibility

Architectural Optimization

Adjusted dimension configurations for xl/xxl variants, increasing d_model while reducing attention heads

Model Capabilities

Text Generation

Text Classification

Question Answering

Summarization

Machine Translation

Use Cases

Text Generation

Content Summarization

Generates concise summaries of long documents

Achieves SOTA on CNN/Daily Mail dataset

Question Answering

Open-domain QA

Answers natural language questions based on textual knowledge

Excellent performance on Natural Questions benchmark

🚀 Google's T5 Version 1.1

Google's T5 Version 1.1 is an enhanced version of the original T5 model, bringing multiple improvements in architecture and training methods, which can better adapt to various natural language processing tasks.

🚀 Quick Start

This README provides an in - depth introduction to Google's T5 Version 1.1, including its improvements, pre - training information, and related research papers.

✨ Features

Improvements in Version 1.1

Activation Function: T5 Version 1.1 uses GEGLU activation in the feed - forward hidden layer instead of ReLU. Refer to here for more details.
Dropout: Dropout was turned off during pre - training (leading to quality improvement). It should be re - enabled during fine - tuning.
Pre - training Data: It was pre - trained only on C4 without mixing in downstream tasks.
Parameter Sharing: There is no parameter sharing between the embedding and classifier layers.
Model Size: "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different, with a larger d_model and smaller num_heads and d_ff.

Note: T5 Version 1.1 was only pre - trained on C4 without any supervised training. So, this model needs to be fine - tuned before it can be used on a downstream task.

📚 Documentation

Pretraining Dataset

The model was pre - trained on the C4 dataset.

Other Community Checkpoints

You can find other community checkpoints [here](https://huggingface.co/models?search=t5 - v1_1).

Paper

Authors

The authors of the paper are Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

Abstract

Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Our systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre - trained models, and code.

Model Image

model image

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご