Legal_t5_small_trans_es_en_small_finetuned open-source model - Free deployment to achieve Spanish-to-English translation of legal texts

Legal T5 Small Trans Es En Small Finetuned

Developed by SEBIS

This model is used for translating legal texts from Spanish to English, optimized and trained based on the T5-small architecture.

Machine Translation #Legal text translation #Spanish-English #EU law specialized

Downloads 44

Release Time : 3/2/2022

Model Overview

A specialized Spanish-to-English translation model for legal domain texts, fine-tuned using three major legal parallel corpora: jrc-acquis, europarl, and dcep.

Model Features

Legal domain optimization

Specially trained for legal terminology and sentence structures to improve translation accuracy of legal texts

Multicorpus training

Integrates three authoritative legal parallel corpora: JRC-ACQUIS, EUROPARL, and DCEP

Efficient architecture

Uses the streamlined T5-small architecture to reduce computational resource requirements while maintaining performance

Model Capabilities

Legal text translation

Cross-language semantic conversion

Specialized terminology processing

Use Cases

Legal document processing

EU legal document translation

Accurately translates Spanish versions of EU legal documents into English

BLEU score 54.481

Cross-border legal document conversion

Handles multilingual document conversion needs in international legal cooperation

🚀 legal_t5_small_trans_es_en_small_finetuned Model

A model designed for translating legal text from Spanish to English, offering high - quality translation services in the legal domain.

🚀 Quick Start

The legal_t5_small_trans_es_en_small_finetuned model is dedicated to translating legal text from Spanish to English. It was first introduced in this repository. The model is pre - trained on all translation data through an unsupervised task and then fine - tuned on three parallel corpora from jrc - acquis, europarl, and dcep.

✨ Features

Unsupervised Pretraining: Initially pre - trained on an unsupervised "masked language modelling" task with all the training set data.
Based on t5 - small: Built upon the t5 - small model, it uses dmodel = 512, dff = 2,048, 8 - headed attention, and 6 layers each in the encoder and decoder, scaling down the baseline model of t5.
Smaller Parameter Count: With about 60 million parameters, it is a relatively smaller model.

💻 Usage Examples

Basic Usage

Here is how to use this model to translate legal text from Spanish to English in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_es_en_small_finetuned"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_es_en", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

es_text = "de Jonas Sjöstedt (GUE/NGL)"

pipeline([es_text], max_length=512)

📚 Documentation

Model description

The legal_t5_small_trans_es_en_small_finetuned model is initially pre - trained on an unsupervised "masked language modelling" task with all the data in the training set. It is based on the t5 - small model and trained on a large parallel text corpus. This smaller model scales down the t5 baseline model, having about 60 million parameters.

Intended uses & limitations

The model is suitable for translating legal texts from Spanish to English.

Training data

The legal_t5_small_trans_es_en_small_finetuned model was trained on the [JRC - ACQUIS](https://wt - public.emm4u.eu/Acquis/index_2.2.html), EUROPARL, and [DCEP](https://ec.europa.eu/jrc/en/language - technologies/dcep) datasets, which contain 9 million parallel texts.

Training procedure

Overall Training: The model was trained on a single TPU Pod V3 - 8 for 250K steps in total, using a sequence length of 512 (batch size 4096). It has approximately 220M parameters and uses an encoder - decoder architecture.
Preprocessing: A unigram model was trained with 88M lines of text from the parallel corpus (of all possible language pairs) to obtain the vocabulary (with byte pair encoding) for this model.
Pretraining: The pre - training data was the combined data from all 42 language pairs, and the task was to predict randomly masked portions of a sentence.

Evaluation results

When used on the translation test dataset, the model achieves the following results:

Model	BLEU score
legal_t5_small_trans_es_en_small_finetuned	54.481

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | [LinkedIn](https://www.linkedin.com/in/prof - ahmed - elnaggar/)

Property	Details
Model Type	A model for translating legal text from Spanish to English, based on t5 - small and fine - tuned on specific corpora
Training Data	Trained on [JRC - ACQUIS](https://wt - public.emm4u.eu/Acquis/index_2.2.html), EUROPARL, and [DCEP](https://ec.europa.eu/jrc/en/language - technologies/dcep) datasets with 9 million parallel texts

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご