T5 High-Efficiency Small - NL22 Open - Source Model - Focused on Improving Downstream Task Performance and Free to Use

T5 Efficient Small Nl22

Developed by google

T5 Efficient Small-NL22 is a deep narrow variant of Google's T5 model, focusing on improving downstream task performance by increasing model depth.

Large Language Model EnglishOpen Source License:Apache-2.0 #Deep Narrow Architecture #English Pretraining #Efficient Parameter Utilization

Downloads 17

Release Time : 3/2/2022

Model Overview

This is a pretrained-only checkpoint based on the T5 architecture, employing a deep narrow design strategy that prioritizes increasing model depth over width to enhance computational efficiency and downstream task performance.

Model Features

Deep Narrow Architecture

Prioritizes increasing model depth over width, with research showing this architecture is more efficient for downstream tasks.

Efficient Pretraining

Pretrained for 524,288 steps on the C4 dataset using span-based masked language modeling objectives.

Parameter Efficiency

Outperforms other architectures with similar parameter counts in computational efficiency (parameter count, FLOPs, and speed).

Model Capabilities

Text generation

Text summarization

Question answering system

Text classification (requires fine-tuning)

Use Cases

Text generation

Automatic summarization

Generates concise summaries of long documents

Question answering system

Open-domain QA

Answers questions based on given text

🚀 T5-Efficient-SMALL-NL22 (Deep-Narrow version)

T5-Efficient-SMALL-NL22 is a variant of Google's original T5, following the T5 model architecture. It's a pretrained-only checkpoint, released with the paper Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers by Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler. This model shows that a Deep-Narrow architecture can offer better downstream performance compared to other architectures with a similar number of parameters.

✨ Features

The paper suggests that a Deep-Narrow model architecture is more favorable for downstream performance than other architectures with a similar parameter count. Specifically, increasing the model's depth before other forms of scaling can lead to better Pareto-efficiency, with the relative gain diminishing after 32 - 36 layers.

⚠️ Important Note

The efficiency here relates to any one compute dimension, i.e., params, FLOPs, or throughput (speed). The paper reports all three key efficiency metrics, leaving the decision of which compute dimension to consider to the practitioner.

📚 Documentation

Details model architecture

This model checkpoint - t5-efficient-small-nl22 - is of the Small model type with the variation that nl is 22. It has 178.04 million parameters, requiring ca. 712.16 MB of memory in full precision (fp32) or 356.08 MB in half precision (fp16 or bf16).

Property	Details
Model Type	Small
Number of Parameters	178.04 million
Memory Requirement (fp32)	ca. 712.16 MB
Memory Requirement (fp16 or bf16)	356.08 MB

A summary of the original T5 model architectures:

Model	nl (el/dl)	ff	dm	kv	nh	#Params
Tiny	4/4	1024	256	32	4	16M
Mini	4/4	1536	384	32	8	31M
Small	6/6	2048	512	32	8	60M
Base	12/12	3072	768	64	12	220M
Large	24/24	4096	1024	64	16	738M
Xl	24/24	16384	1024	128	32	3B
XXl	24/24	65536	1024	128	128	11B

Abbreviations used:

Abbreviation	Definition
nl	Number of transformer blocks (depth)
dm	Dimension of embedding vector (output vector of transformers block)
kv	Dimension of key/value projection matrix
nh	Number of attention heads
ff	Dimension of intermediate vector within transformer block (size of feed - forward projection matrix)
el	Number of transformer blocks in the encoder (encoder depth)
dl	Number of transformer blocks in the decoder (decoder depth)
sh	Signifies that attention heads are shared
skv	Signifies that key - values projection matrices are tied

If a model checkpoint has no specific el or dl, both the number of encoder- and decoder layers correspond to nl.

Pre-Training

The checkpoint was pretrained on the Colossal, Cleaned version of Common Crawl (C4) for 524288 steps using the span-based masked language modeling (MLM) objective.

Fine-Tuning

⚠️ Important Note

This model is a pretrained checkpoint and needs to be fine-tuned for practical use. It was pretrained in English and is only useful for English NLP tasks.

You can follow these examples to fine-tune the model:

PyTorch:

Summarization
Question Answering
Text Classification - Note: You'll need to slightly adapt the training example to work with an encoder-decoder model.

Tensorflow:

Summarization
Text Classification - Note: You'll need to slightly adapt the training example to work with an encoder-decoder model.

JAX/Flax:

Summarization
Text Classification - Note: You'll need to slightly adapt the training example to work with an encoder-decoder model.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご