T5-efficient-mini Open-source Model - Deep and Narrow Architecture for Better Performance on Downstream Tasks

T5 Efficient Mini

Developed by google

T5-Efficient-MINI is a variant of Google's original T5, adopting a deep narrow architecture that demonstrates superior downstream task performance among models with similar parameter counts.

Large Language Model EnglishOpen Source License:Apache-2.0 #Deep Narrow Architecture #English Pre-training #Efficient Parameter Utilization

Downloads 946

Release Time : 3/2/2022

Model Overview

This is a pre-trained checkpoint only, based on the T5 model architecture, employing a deep narrow design strategy that prioritizes increasing model depth for efficiency.

Model Features

Deep Narrow Architecture

Prioritizes increasing model depth over width, delivering better downstream task performance with the same parameter count.

Efficient Pre-training

Pre-trained on the C4 dataset for 524,288 steps using masked language modeling objectives with spans.

Compact Model Size

Only 31.23 million parameters, requiring approximately 124.92MB memory at full precision, suitable for resource-constrained scenarios.

Model Capabilities

Text generation

Text summarization

Question answering

Text classification

Use Cases

Text Processing

Summarization

Generate concise summaries from long texts

Question Answering

Answer relevant questions based on given texts

Text Classification

Classify text content

🚀 T5-Efficient-MINI (Deep-Narrow version)

T5-Efficient-MINI is a variation of Google's original T5 following the T5 model architecture. It's a pretrained-only checkpoint released with the paper Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers by Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler. This model shows that a Deep-Narrow architecture can offer better downstream performance compared to other models with a similar parameter count.

✨ Features

Deep-Narrow Architecture: A Deep-Narrow model architecture is more favorable for downstream performance compared to other architectures of similar parameter count.
Model Depth Definition: Model depth is defined as the number of transformer blocks stacked sequentially, and word embeddings are processed by each block in sequence.

📦 Installation

No installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Details model architecture

This model checkpoint - t5-efficient-mini - is of model type Mini with no variations. It has 31.23 million parameters, requiring ca. 124.92 MB of memory in full precision (fp32) or 62.46 MB in half precision (fp16 or bf16).

The following table shows a summary of the original T5 model architectures:

Model	nl (el/dl)	ff	dm	kv	nh	#Params
Tiny	4/4	1024	256	32	4	16M
Mini	4/4	1536	384	32	8	31M
Small	6/6	2048	512	32	8	60M
Base	12/12	3072	768	64	12	220M
Large	24/24	4096	1024	64	16	738M
Xl	24/24	16384	1024	128	32	3B
XXl	24/24	65536	1024	128	128	11B

The following table explains the abbreviations used:

Property	Details
nl	Number of transformer blocks (depth)
dm	Dimension of embedding vector (output vector of transformers block)
kv	Dimension of key/value projection matrix
nh	Number of attention heads
ff	Dimension of intermediate vector within transformer block (size of feed-forward projection matrix)
el	Number of transformer blocks in the encoder (encoder depth)
dl	Number of transformer blocks in the decoder (decoder depth)
sh	Signifies that attention heads are shared
skv	Signifies that key-values projection matrices are tied

If a model checkpoint has no specific el or dl, both the number of encoder- and decoder layers correspond to nl.

Pre-Training

The checkpoint was pretrained on the Colossal, Cleaned version of Common Crawl (C4) for 524288 steps using the span-based masked language modeling (MLM) objective.

Fine-Tuning

⚠️ Important Note

This model is a pretrained checkpoint and has to be fine-tuned for practical usage. The checkpoint was pretrained in English and is therefore only useful for English NLP tasks.

You can follow one of the following examples to fine-tune the model:

PyTorch:

Summarization
Question Answering
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.

Tensorflow:

Summarization
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.

JAX/Flax:

Summarization
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.

Downstream Performance

TODO: Add table if available

Computational Complexity

TODO: Add table if available

More information

We strongly recommend the reader to go carefully through the original paper Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers to get a more nuanced understanding of this model checkpoint. As explained in the following issue, checkpoints including the sh or skv model architecture variations have not been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description. Those checkpoints are kept here as they might be ported potentially in the future.

📄 License

This project is licensed under the apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご