T5-large-lm-adapt Open-source Text Generation Model - Enhanced Prompt Tuning Capability through Extra Training

T5 Large Lm Adapt

Developed by google

The LM adapted version of T5 Version 1.1 is an improved text generation model based on the T5 architecture, further trained with language modeling objectives to enhance prompt tuning capabilities.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Text-to-text conversion #GEGLU activation function #Unsupervised pre-training

Downloads 501

Release Time : 3/2/2022

Model Overview

This model is an improved version of T5 Version 1.1, specifically adapted for language modeling objectives, suitable for various text generation and understanding tasks.

Model Features

GEGLU Activation Function

Uses GEGLU activation function in feed-forward hidden layers instead of ReLU to improve model performance.

No Dropout Pre-training

Dropout is disabled during pre-training to enhance quality and must be re-enabled during fine-tuning.

Pure C4 Dataset Pre-training

Pre-trained exclusively on the C4 dataset without mixing downstream task data to maintain training data purity.

Parameter Separation

No parameter sharing between embedding and classifier layers, enhancing model flexibility.

Improved Model Architecture

Adopts larger `d_model` and smaller `num_heads` and `d_ff` to optimize model performance.

Model Capabilities

Text generation

Text understanding

Question answering

Summarization

Text classification

Use Cases

Natural Language Processing

Prompt Tuning

With additional training on language modeling objectives, the model performs better in prompt tuning tasks.

Improved prompt tuning effectiveness

Text Generation

Suitable for generating coherent, contextually relevant text.

High-quality text generation

Question Answering

Can be used to build question answering systems that respond to text-based queries.

Accurate answers to user questions

🚀 Google's T5 Version 1.1 - LM-Adapted

Google's T5 Version 1.1 - LM-Adapted is an enhanced version of the original T5 model, offering improvements in activation function, pre - training settings, and model architecture. It is pre - trained on the C4 dataset and shows better performance in prompt tuning.

🚀 Quick Start

This README provides detailed information about Google's T5 Version 1.1 - LM Adapted, including its improvements, pre - training details, and related research paper.

✨ Features

Improvements in Version 1.1 - LM Adapted

T5 Version 1.1 - LM Adapted offers the following enhancements compared to the original T5 model:

Activation Function: It uses GEGLU activation in the feed - forward hidden layer instead of ReLU. Refer to here for more details.
Dropout: Dropout was turned off during pre - training for better quality. It should be re - enabled during fine - tuning.
Pre - training Data: It is pre - trained only on C4 without mixing downstream tasks.
Parameter Sharing: There is no parameter sharing between the embedding and classifier layer.
Model Shapes: "xl" and "xxl" replace "3B" and "11B". The model has a larger d_model and smaller num_heads and d_ff.

This model is pre - trained on both denoising and language modeling objectives. Specifically, this checkpoint is initialized from T5 Version 1.1 - Large and then trained for an additional 100K steps on the LM objective discussed in the T5 paper. This adaptation improves the model's ability for prompt tuning.

Note: A popular fine - tuned version of the T5 Version 1.1 - LM Adapted model is BigScience's T0pp.

📚 Documentation

Pretraining Dataset

The model is pre - trained on the C4 dataset.

Other Community Checkpoints

You can find other community checkpoints here.

Paper

Authors

The authors of the paper are Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

Abstract

Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, the authors explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Their systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from their exploration with scale and their new “Colossal Clean Crawled Corpus”, they achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, they release their dataset, pre - trained models, and code.

model image

📄 License

This project is licensed under the Apache 2.0 license.

Property	Details
Model Type	T5 Version 1.1 - LM Adapted
Training Data	C4

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご