Open-source model legal_t5_small_trans_fr_es - Free and accurate translation of legal texts from French to Spanish

Legal T5 Small Trans Fr Es

Developed by SEBIS

This model is designed for translating French legal texts into Spanish, based on the t5-small architecture, suitable for legal document translation tasks.

Machine Translation #Legal text translation #French to Spanish translation #Small parameter model

Downloads 18

Release Time : 3/2/2022

Model Overview

A French-to-Spanish legal text translation model based on the T5-small architecture, specifically optimized for legal domain texts.

Model Features

Legal domain optimization

Specifically trained for legal texts, capable of accurately translating legal terminology and complex sentence structures.

Efficient architecture

Utilizes the lightweight T5-small architecture, reducing computational resource requirements while maintaining good performance.

Multi-dataset training

Trained on three major legal parallel corpora: JRC-ACQUIS, EUROPARL, and DCEP, covering a wide range of legal text types.

Model Capabilities

French-to-Spanish legal text translation

Handling legal terminology and complex sentence structures

Batch text translation

Use Cases

Legal document translation

EU legal document translation

Accurately translates French versions of EU legal documents into Spanish

BLEU score 51.16

Legal committee document processing

Translates meeting minutes, reports, and other legal-related documents for committees

🚀 legal_t5_small_trans_fr_es Model

A model designed for translating legal text from French to Spanish, offering efficient and accurate translation services.

🚀 Quick Start

The legal_t5_small_trans_fr_es model is tailored for translating legal text from French to Spanish. It was initially released in this repository and trained on three parallel corpora from JRC-ACQUIS, Europarl, and DCEP.

✨ Features

Based on t5-small: The model is built upon the t5-small architecture, with a scaled - down baseline model. It uses dmodel = 512, dff = 2,048, 8 - headed attention, and only 6 layers each in the encoder and decoder, resulting in about 60 million parameters.
Trained on Multiple Datasets: It was trained on [JRC - ACQUIS](https://wt - public.emm4u.eu/Acquis/index_2.2.html), EUROPARL, and [DCEP](https://ec.europa.eu/jrc/en/language - technologies/dcep) datasets, which consist of 5 Million parallel texts.

📦 Installation

There is no specific installation step provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to translate legal text from French to Spanish in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_fr_es"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_fr_es", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

fr_text = "commission des libertés civiles, de la justice et des affaires intérieures"

pipeline([fr_text], max_length=512)

📚 Documentation

Intended uses & limitations

The model could be used for translation of legal texts from French to Spanish.

Training data

The legal_t5_small_trans_fr_es model was trained on [JRC - ACQUIS](https://wt - public.emm4u.eu/Acquis/index_2.2.html), EUROPARL, and [DCEP](https://ec.europa.eu/jrc/en/language - technologies/dcep) dataset consisting of 5 Million parallel texts.

Training procedure

Overall Training: The model was trained on a single TPU Pod V3 - 8 for 250K steps in total, using sequence length 512 (batch size 4096). It has a total of approximately 220M parameters and was trained using the encoder - decoder architecture.
Optimizer: The optimizer used is AdaFactor with inverse square root learning rate schedule for pre - training.
Preprocessing: An unigram model trained with 88M lines of text from the parallel corpus (of all possible language pairs) to get the vocabulary (with byte pair encoding), which is used with this model.

Evaluation results

When the model is used for translation test dataset, it achieves the following results:

Model	BLEU score
legal_t5_small_trans_fr_es	51.16

🔧 Technical Details

The model is based on the t5 - small model. It scales down the baseline model of t5 by using dmodel = 512, dff = 2,048, 8 - headed attention, and only 6 layers each in the encoder and decoder. This variant has about 60 million parameters.

📄 License

There is no license information provided in the original document.

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | [LinkedIn](https://www.linkedin.com/in/prof - ahmed - elnaggar/)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご