T5-XL-LM-Adapt Open-Source Language Model - A Practical Tool Optimized for Language Modeling Tasks

T5 Xl Lm Adapt

Developed by google

T5 1.1 Language Model Adaptation is an improved version of the original T5 model, using the GEGLU activation function, removing parameter sharing, and optimized specifically for language modeling tasks

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #GEGLU activation function #Dual objectives of denoising and language modeling #Prompt tuning optimization

Downloads 1,111

Release Time : 3/2/2022

Model Overview

This model is an improved version of the T5 architecture, specifically adapted for language modeling tasks, enhancing prompt tuning capabilities through improved activation functions and training strategies

Model Features

GEGLU activation function

Uses GEGLU activation function in feed-forward hidden layers instead of ReLU to enhance model expressiveness

No Dropout pre-training

Disables Dropout during pre-training to improve quality, requires re-enabling during fine-tuning

Pure C4 dataset training

Pre-trained exclusively on the C4 dataset without mixing downstream task data to ensure training consistency

Parameter decoupling

Removes parameter sharing between embedding and classifier layers to enhance model flexibility

Dual-objective pre-training

Pre-trained simultaneously on denoising and language modeling objectives

Model Capabilities

Text generation

Text understanding

Transfer learning

Prompt tuning

Zero-shot learning

Use Cases

Natural Language Processing

Text summarization

Generates concise summaries of input text

Achieves SOTA on multiple summarization benchmarks

Question answering

Answers questions based on given context

Performs excellently on various QA tasks

Text classification

Classifies text into multiple categories

Achieves good results on benchmarks like GLUE

Prompt engineering

Zero-shot learning

Performs unseen tasks through natural language prompts

Significantly improves prompt tuning capability after adapting to language modeling objectives

🚀 Google's T5 Version 1.1 - LM-Adapted

This is a version of Google's T5 model, which has been improved and pre - trained for better performance in natural language processing tasks.

🚀 Quick Start

This README provides detailed information about Google's T5 Version 1.1 - LM - Adapted, including its improvements, pre - training details, and related references.

✨ Features

T5 Version 1.1 - LM Adapted offers the following enhancements compared to the original T5 model:

It uses GEGLU activation in the feed - forward hidden layer instead of ReLU. For more details, refer to here.
Dropout was disabled during pre - training (resulting in a quality improvement), and it should be re - enabled during fine - tuning.
It was pre - trained only on C4 without mixing in downstream tasks.
There is no parameter sharing between the embedding and classifier layer.
"xl" and "xxl" replace "3B" and "11B". The model shapes are slightly different, with a larger d_model and smaller num_heads and d_ff.

This model is pre - trained on both the denoising and language modeling objectives. Specifically, this checkpoint is initialized from T5 Version 1.1 - XL and then trained for an additional 100K steps on the LM objective discussed in the T5 paper. This adaptation enhances the model's ability for prompt tuning.

Note: A well - known fine - tuned version of the T5 Version 1.1 - LM Adapted model is BigScience's T0pp.

📚 Documentation

Pretraining Dataset

Other Community Checkpoints

here

Paper

Exploring the Limits of Transfer Learning with a Unified Text - to - Text Transformer

Authors

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

Abstract

Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Our systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre - trained models, and code.

model image

📄 License

This project is licensed under the Apache 2.0 License.

Property	Details
Model Type	T5 Version 1.1 - LM - Adapted
Training Data	C4

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご