T5-v1_1-xxl Open-source Text Conversion Model - Achieve Text Conversion with a Pure Unsupervised Strategy for Free

T5 V1 1 Xxl

Developed by google

T5 1.1 is Google's improved text-to-text Transformer model, employing GEGLU activation function and pure unsupervised pretraining strategy

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Text-to-Text Unified Framework #GEGLU Activation Function #Pure Unsupervised Pretraining

Downloads 597.64k

Release Time : 3/2/2022

Model Overview

A unified text processing framework based on Transformer, achieving excellent performance on various NLP tasks through transfer learning

Model Features

GEGLU Activation Function

Uses GEGLU instead of ReLU in feed-forward hidden layers to enhance model expressiveness

Pure Unsupervised Pretraining

Pretrained solely on the C4 dataset without mixing downstream task data

Parameter Separation Strategy

Embedding layers and classifier layers do not share parameters, improving model flexibility

Scalability Architecture Adjustment

Optimizes large model performance with larger d_model and smaller num_heads/d_ff ratio

Model Capabilities

Text generation

Text classification

Question answering systems

Summarization

Machine translation

Text rewriting

Use Cases

Text summarization

News summarization

Condenses long articles into key information summaries

Achieves SOTA on CNN/Daily Mail dataset

Intelligent Q&A

Open-domain question answering

Answers natural language questions based on text content

Performs excellently on benchmarks like Natural Questions

Text classification

Sentiment analysis

Determines text sentiment (positive/negative)

Highly competitive on GLUE benchmark

🚀 Google's T5 Version 1.1

Google's T5 Version 1.1 is an enhanced natural language processing model with several improvements over the original T5, offering better performance and flexibility for various NLP tasks.

🚀 Quick Start

This README provides an in - depth introduction to Google's T5 Version 1.1, including its improvements, pre - training details, and related research information.

✨ Features

Improvements in Version 1.1

Activation Function: T5 Version 1.1 uses GEGLU activation in the feed - forward hidden layer instead of ReLU. For more details, refer to here.
Dropout: Dropout was turned off during pre - training, which led to quality improvements. It should be re - enabled during fine - tuning.
Pre - training Dataset: It was pre - trained only on C4 without mixing in downstream tasks.
Parameter Sharing: There is no parameter sharing between the embedding and classifier layer.
Model Naming and Shape: "xl" and "xxl" replace "3B" and "11B". The model shapes have larger d_model and smaller num_heads and d_ff.

Note

T5 Version 1.1 was only pre - trained on C4 without any supervised training. So, it must be fine - tuned before being used on a downstream task.

📚 Documentation

Pretraining Dataset

The model was pre - trained on the C4 dataset.

Other Community Checkpoints

You can find other community checkpoints [here](https://huggingface.co/models?search=t5 - v1_1).

Paper

Authors

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

Abstract

Transfer learning, a powerful technique in natural language processing (NLP), involves pre - training a model on a data - rich task and then fine - tuning it on a downstream task. The effectiveness of transfer learning has led to a variety of approaches, methodologies, and practices. In this paper, the authors introduce a unified framework that converts every language problem into a text - to - text format to explore the landscape of transfer learning techniques for NLP. Their systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining insights from their exploration with scale and the new “Colossal Clean Crawled Corpus”, they achieve state - of - the - art results on many benchmarks, including summarization, question answering, and text classification. To facilitate future work on transfer learning for NLP, they release their dataset, pre - trained models, and code.

model image

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご