T5-spanish-efficient-tiny Open-source Model - Optimized for Spanish, Small Size for CPU Usage

T5 Spanish Efficient Tiny

Developed by jalbarracin

This is an efficient micro T5 model optimized for Spanish, with a small size (<29MB), suitable for CPU usage, requires fine-tuning before use

Large Language Model

Transformers

SpanishOpen Source License:Apache-2.0 #Spanish-specific #Lightweight fine-tuning #CPU-friendly

Downloads 269

Release Time : 5/1/2023

Model Overview

A Spanish-specific micro model based on the T5 architecture, suitable for simple task processing, fully trained from scratch using Spanish datasets

Model Features

Spanish-specific

Trained entirely on Spanish datasets, equipped with a dedicated Spanish tokenizer

Lightweight and efficient

Micro architecture design, requires only 29MB memory at full precision, suitable for CPU environments

Deep-narrow architecture

Optimized 4-layer encoder/3-layer decoder structure, feed-forward dimension 512, embedding dimension 320

Model Capabilities

Spanish text processing

Text generation (requires fine-tuning)

Text conversion (requires fine-tuning)

Use Cases

Text processing

Spanish summarization

Can be fine-tuned to generate summaries for Spanish documents

Spanish Q&A system

Can be fine-tuned to build Spanish question-answering applications

🚀 T5-Spanish-Efficient-TINY (NEW Deep-Narrow Spanish Version - March 2024)

T5-Efficient-TINY is a variation of Google's original T5 that follows the T5 model architecture. It has been trained by Javier Albarracín from Quantico AI. The original version was shared in the paper Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers by Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler.

This version of the model has been trained from scratch using a Spanish dataset. This version NEEDS FINE-TUNING as it has not been trained on any specific task. The advantage of this model is that it is in Spanish and can be used for training simple tasks. Due to its relatively low complexity and weight of <29mb, it is ideal for CPU usage.

It has its own Spanish tokenizer (only lowercase letters) with a size of 5000 tokens.

✨ Features

Model Architecture Details

This model - T5-spanish-efficient-tiny - is of the Tiny type with variations in the dimension and size of the feed forward layers. It has 17.94 million parameters and requires 29 MB of memory in full precision (fp32) or 15 MB of memory in half precision (fp16 or bf16).

This Spanish model has been created with lighter features than the original Tiny model.

Model	nl (el/dl)	ff	dm	kv	nh	#Params
This	4/3	512	320	64	4	7M

A summary of the original T5 model can be seen below:

Model	nl (el/dl)	ff	dm	kv	nh	#Params
Tiny	4/4	1024	256	32	4	16M
Mini	4/4	1536	384	32	8	31M
Small	6/6	2048	512	32	8	60M
Base	12/12	3072	768	64	12	220M
Large	24/24	4096	1024	64	16	738M
Xl	24/24	16384	1024	128	32	3B
XXl	24/24	65536	1024	128	128	11B

The abbreviations used:

Property	Details
nl	Number of transformer blocks (depth)
dm	Dimension of embedding vector (output vector of transformers block)
kv	Dimension of key/value projection matrix
nh	Number of attention heads
ff	Dimension of intermediate vector within transformer block (size of feed-forward projection matrix)
el	Number of transformer blocks in the encoder (encoder depth)
dl	Number of transformer blocks in the decoder (decoder depth)
sh	Signifies that attention heads are shared
skv	Signifies that key-values projection matrices are tied

If a model checkpoint has no specific el or dl, then both the number of encoder- and decoder layers correspond to nl.

Pre-Training

It has been pre-trained with 2MM random records from the Spanish version of the MSMARCO dataset.

Fine-Tuning

⚠️ Important Note

This model requires fine-tuning to work. Here are some examples of how to do it:

PyTorch:

Summarization
Question Answering
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.

Tensorflow:

Summarization
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.

JAX/Flax:

Summarization
Text Classification - Note: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご