Open-source T5-XXL-LM-ADAPT Language Model - Optimize Language Modeling, Improve Prompt Tuning Effects

T5 Xxl Lm Adapt

Developed by google

The LM adapted version of T5 Version 1.1 is a large-scale language model based on the T5 architecture, optimized for language modeling objectives, improving performance in prompt tuning.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Text-to-text conversion #Multi-task learning #Language model adaptation

Downloads 61

Release Time : 3/2/2022

Model Overview

This model is an improved version of T5 Version 1.1, additionally trained with language modeling objectives to enhance text generation and comprehension capabilities, suitable for various NLP tasks.

Model Features

GEGLU Activation Function

Uses GEGLU instead of ReLU activation in feed-forward hidden layers, improving model performance.

Language Model Adaptation

Additionally trained for 100K steps with language modeling objectives, enhancing prompt tuning capabilities.

No Dropout Pre-training

Dropout is turned off during pre-training for higher quality, and needs to be re-enabled during fine-tuning.

Independent Parameter Design

Embedding and classifier layers do not share parameters, increasing model flexibility.

Model Capabilities

Text generation

Text comprehension

Question answering

Summarization

Text classification

Machine translation

Use Cases

Text generation

Content creation

Automatically generate articles, stories, or other creative text content

Question answering

Intelligent customer service

Build customer service systems capable of understanding and answering user questions

Text summarization

News summarization

Automatically generate concise summaries of long articles

🚀 Google's T5 Version 1.1 - LM-Adapted

Google's T5 Version 1.1 - LM-Adapted is an improved version of the original T5 model, offering enhanced performance and capabilities for various NLP tasks.

🚀 Quick Start

This section provides an overview of the key features and improvements of Google's T5 Version 1.1 - LM-Adapted.

✨ Features

Improvements over the Original T5 Model

Activation Function: Utilizes GEGLU activation in the feed-forward hidden layer instead of ReLU. Refer to here for more details.
Dropout: Dropout is turned off during pre-training for better quality. It should be re-enabled during fine-tuning.
Pre-training Data: Pre-trained solely on the C4 dataset without incorporating downstream tasks.
Parameter Sharing: There is no parameter sharing between the embedding and classifier layers.
Model Sizes: "xl" and "xxl" replace "3B" and "11B", with different model shapes (larger d_model, smaller num_heads, and d_ff).

Pretraining Objectives

The model is pre-trained on both denoising and language modeling objectives. This specific checkpoint is initialized from T5 Version 1.1 - XXL and then trained for an additional 100K steps on the LM objective described in the T5 paper. This adaptation enhances the model's suitability for prompt tuning.

Popular Fine-tuned Version

A well - known fine-tuned version of the T5 Version 1.1 - LM Adapted model is BigScience's T0pp.

📚 Documentation

Pretraining Dataset

The model is pre-trained on the C4 dataset.

Other Community Checkpoints

You can find other community checkpoints here.

Paper

Authors

The authors of the paper are Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

Abstract

Transfer learning, which involves pre - training a model on a data - rich task before fine - tuning it on a downstream task, has become a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has led to a variety of approaches, methodologies, and practices. In this paper, the authors explore the transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Their systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors across dozens of language understanding tasks. By combining insights from their exploration with scale and the new “Colossal Clean Crawled Corpus”, they achieve state - of - the - art results on many benchmarks, including summarization, question answering, text classification, etc. To facilitate future work on transfer learning for NLP, they release their dataset, pre - trained models, and code.

Model Image

model image

📄 License

The project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご