Legal_t5_small_trans_de_es Open Source Model - Professionally Translate German Legal Texts into Spanish for Free

Legal T5 Small Trans De Es

Developed by SEBIS

This model is designed for translating legal texts from German to Spanish, trained on the t5-small architecture, suitable for professional translation tasks in the legal domain.

Machine Translation #Legal text translation #German-Spanish #EU document specialization

Downloads 14

Release Time : 3/2/2022

Model Overview

A legal text translation model trained on three parallel corpora: jrc-acquis, europarl, and dcep, specifically handling German to Spanish legal document translation.

Model Features

Legal domain specialization

Specifically trained for legal texts, capable of accurately handling legal terminology and complex sentence structures.

Efficient small model

Optimized based on the t5-small architecture, reducing computational resource requirements while maintaining translation quality.

Multi-source training data

Integrated training from three authoritative legal corpora: JRC-ACQUIS, EUROPARL, and DCEP.

Model Capabilities

Legal text translation

Accurate terminology conversion

Complex sentence processing

Use Cases

Legal document processing

EU legal document translation

Translating German versions of EU regulations/directives into Spanish versions

Maintaining the precision and format compliance of legal texts

Cross-border legal document processing

Assisting legal institutions in multilingual document conversion for cross-border cases

🚀 legal_t5_small_trans_de_es model

A model for translating legal text from German to Spanish. It offers an efficient solution for legal document translation between these two languages.

🚀 Quick Start

The legal_t5_small_trans_de_es model is designed for translating legal text from German to Spanish. It was first released in this repository and trained on three parallel corpora from JRC-ACQUIS, Europarl, and DCEP.

✨ Features

Based on t5-small: The model is built upon the t5-small architecture, which is scaled down with dmodel = 512, dff = 2,048, 8 - headed attention, and only 6 layers each in the encoder and decoder. It has about 60 million parameters.
Trained on Large Parallel Corpus: It was trained on a large corpus of parallel text, enabling accurate legal text translation.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to translate legal text from German to Spanish in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_de_es"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_de_es", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

de_text = "7. betont, dass die Kommission und die Mitgliedstaaten die Rolle der Frauen in der Sozialwirtschaft aufgrund der hohen Frauenerwerbstätigkeit in dem Sektor und der Bedeutung der Dienstleistungen, die er für die Förderung der Vereinbarkeit von Beruf und Privatleben bietet, aufwerten, unterstützen und verstärken müssen;"

pipeline([de_text], max_length=512)

📚 Documentation

Model description

The legal_t5_small_trans_de_es model is based on the t5-small model and trained on a large parallel text corpus. It's a smaller model with specific architectural settings, resulting in about 60 million parameters.

Intended uses & limitations

The model is intended for the translation of legal texts from German to Spanish.

Training data

The model was trained on JRC - ACQUIS, EUROPARL, and DCEP datasets, which consist of 5 million parallel texts.

Training procedure

Overall Training: The model was trained on a single TPU Pod V3 - 8 for 250K steps in total, using a sequence length of 512 (batch size 4096). It has approximately 220M parameters and was trained using the encoder - decoder architecture.
Optimizer: The optimizer used is AdaFactor with an inverse square root learning rate schedule for pre - training.
Preprocessing: An unigram model was trained with 88M lines of text from the parallel corpus (of all possible language pairs) to get the vocabulary (with byte pair encoding), which is used with this model.

Evaluation results

When the model is used for the translation test dataset, it achieves the following results:

Model	BLEU score
legal_t5_small_trans_de_es	47.24

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | LinkedIn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご