t5-small-spanish-nahuatl Open-source Translation Model - Achieve Free Spanish-to-Nahuatl Translation

T5 Small Spanish Nahuatl

Developed by milmor

A Spanish to Nahuatl translation model fine-tuned based on T5 Transformer, trained with approximately 29,000 bilingual sentences

Machine Translation

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Spanish-Nahuatl translation #Small-scale fine-tuning #Endangered language processing

Downloads 16

Release Time : 3/29/2022

Model Overview

This model is specifically designed for translating Spanish to Nahuatl, fine-tuned on the T5-small architecture, suitable for bilingual translation tasks

Model Features

Bilingual translation capability

Optimized specifically for bidirectional translation between Spanish and Nahuatl

Standardized data processing

Training data normalized using py-elotl's 'sep' method

Lightweight model

Based on T5-small architecture, suitable for deployment in resource-limited environments

Model Capabilities

Spanish to Nahuatl translation

Text-to-text conversion

Use Cases

Language translation

Daily phrase translation

Translate Spanish daily phrases into Nahuatl

For example, input 'muchas flores son blancas', output 'miak xochitl istak'

Cultural preservation

Assist in preserving and disseminating Nahuatl cultural heritage

🚀 t5-small-spanish-nahuatl

This is a fine - tuned T5 Transformer model for Spanish - Nahuatl translation, offering high - quality translation services.

🚀 Quick Start

This model is designed for Spanish - Nahuatl translation tasks. It is based on the T5 Transformer architecture and fine - tuned on a large number of Spanish and Nahuatl sentence pairs.

✨ Features

Fine - tuned on Diverse Data: The model is fine - tuned on 29,007 Spanish and Nahuatl sentences, including 12,890 samples collected from the web and 16,117 samples from the Axolotl dataset.
Normalized Dataset: The dataset is normalized using 'sep' normalization from [py - elotl](https://github.com/ElotlMX/py - elotl).

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained('milmor/t5-small-spanish-nahuatl')
tokenizer = AutoTokenizer.from_pretrained('milmor/t5-small-spanish-nahuatl')

model.eval()
sentence = 'muchas flores son blancas'
input_ids = tokenizer('translate Spanish to Nahuatl: ' + sentence, return_tensors='pt').input_ids
outputs = model.generate(input_ids)
# outputs = miak xochitl istak
outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

📚 Documentation

Evaluation results

The model is evaluated on 400 validation sentences.

Validation loss: 1.36

⚠️ Important Note

Since the Axolotl corpus contains multiple misalignments, the real Validation loss is slightly better. These misalignments also introduce noise into the training.

🔧 Technical Details

This model is a T5 Transformer ([t5 - small](https://huggingface.co/t5 - small)) fine - tuned on a specific dataset. The dataset comes from both web - collected samples and the Axolotl dataset, and is normalized using 'sep' normalization from [py - elotl](https://github.com/ElotlMX/py - elotl).

📄 License

This project is licensed under the Apache - 2.0 license.

📚 References

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified Text - to - Text transformer.
Ximena Gutierrez - Vasques, Gerardo Sierra, and Hernandez Isaac. 2016. Axolotl: a web accessible parallel corpus for Spanish - Nahuatl. In International Conference on Language Resources and Evaluation (LREC).

Created by Emilio Morales.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご