t5-base-lm-adapt Open-source Text Generation Model - Optimize Efficiency and Achieve Brilliant Text Creation

T5 Base Lm Adapt

Developed by google

The T5 V1.1 Language Model Adaptation is an improved text generation model based on the T5 architecture, optimized with GEGLU activation functions and language modeling objectives, significantly enhancing prompt tuning effectiveness.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #GEGLU activation function #Joint pre-training of denoising and language modeling #Zero-shot prompt tuning optimization

Downloads 1,062

Release Time : 3/2/2022

Model Overview

This model is an improved version of the base T5, focusing on text-to-text transformation tasks, with enhanced language modeling capabilities through architectural optimizations and training objective adjustments.

Model Features

GEGLU activation function

Uses GEGLU activation function in feed-forward hidden layers instead of original ReLU, improving model expressiveness

Dropout-free pre-training

Disables dropout during pre-training to enhance model quality, requiring re-enabling during fine-tuning

Dual-objective training

Simultaneously employs denoising and language modeling objectives during pre-training to strengthen language understanding

Parameter optimization

Adjusts model dimensional structure, increasing d_model dimension while reducing attention heads and feed-forward layer dimensions

Model Capabilities

Text generation

Text transformation

Language modeling

Prompt tuning

Transfer learning

Use Cases

Text generation

Automatic summarization

Condenses long texts into concise summaries

Achieves state-of-the-art results in summarization benchmarks

Question answering

Answers questions based on text content

Performs excellently in multiple QA tasks

Text transformation

Text classification

Classifies input text into predefined categories

Reaches advanced levels in text classification benchmarks

Language translation

Converts text between languages

Supports multiple language translation tasks

🚀 Google's T5 Version 1.1 - LM-Adapted

Google's T5 Version 1.1 - LM-Adapted is an enhanced model based on the original T5, with several improvements for better performance in natural language processing tasks.

🚀 Quick Start

This README provides detailed information about Google's T5 Version 1.1 - LM-Adapted, including its improvements, pre - training details, and relevant research papers.

✨ Features

Improvements in Version 1.1 - LM Adapted

T5 Version 1.1 - LM Adapted offers the following enhancements compared to the original T5 model:

Activation Function: It uses GEGLU activation in the feed - forward hidden layer instead of ReLU. Refer to here for more details.
Dropout: Dropout was turned off during pre - training (resulting in a quality improvement). It should be re - enabled during fine - tuning.
Pre - training Data: It is pre - trained only on C4 without mixing in downstream tasks.
Parameter Sharing: There is no parameter sharing between the embedding and classifier layer.
Model Shapes: "xl" and "xxl" replace "3B" and "11B". The model has a larger d_model and smaller num_heads and d_ff.

This model is pre - trained on both the denoising and language modeling objectives. Specifically, this checkpoint is initialized from T5 Version 1.1 - Base and then trained for an additional 100K steps on the LM objective discussed in the T5 paper. This adaptation enhances the model's ability for prompt tuning.

Note: A well - known fine - tuned version of the T5 Version 1.1 - LM Adapted model is BigScience's T0pp.

Other Information

Pretraining Dataset: C4
Other Community Checkpoints: here
Paper: Exploring the Limits of Transfer Learning with a Unified Text - to - Text Transformer
Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

📚 Documentation

Abstract

Transfer learning, where a model is first pre - trained on a data - rich task before being fine - tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text - to - text format. Our systematic study compares pre - training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state - of - the - art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre - trained models, and code.

model image

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご