Legal_t5_small_trans_de_en open-source model - Free realization of legal text translation from German to English

Legal T5 Small Trans De En

Developed by SEBIS

This model is designed for translating legal texts from German to English, based on the t5-small architecture, suitable for legal document translation tasks.

Machine Translation #Legal text translation #German to English #EU law specialized

Downloads 18

Release Time : 3/2/2022

Model Overview

A legal text translation model based on the T5-small architecture, specifically designed for German to English legal document translation, trained on parallel legal corpora such as JRC-ACQUIS, EUROPARL, and DCEP.

Model Features

Legal domain specialization

Specially trained for legal texts, capable of accurately handling legal terminology and complex sentence structures.

Multi-corpus training

Trained on three major parallel legal corpora—JRC-ACQUIS, EUROPARL, and DCEP—covering a wide range of legal text types.

Efficient small model

Optimized based on the T5-small architecture, reducing computational resource requirements while maintaining performance.

Model Capabilities

Legal text translation

Accurate terminology conversion

Complex sentence processing

Use Cases

Legal document translation

EU legal document translation

Accurately translating EU legal documents from German to English

BLEU score 49.1

Corporate contract translation

Translating German contract documents into English versions

🚀 legal_t5_small_trans_de_en model

A model designed for translating legal text from German to English. It was initially released in this repository and trained on three parallel corpora from JRC-ACQUIS, Europarl, and DCEP.

🚀 Quick Start

The legal_t5_small_trans_de_en model is used for translating legal text from German to English. Here is a simple example of using this model in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_de_en"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_de_en", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

de_text = "Eisenbahnunternehmen müssen Fahrkarten über mindestens einen der folgenden Vertriebswege anbieten: an Fahrkartenschaltern oder Fahrkartenautomaten, per Telefon, Internet oder jede andere in weitem Umfang verfügbare Informationstechnik oder in den Zügen."

pipeline([de_text], max_length=512)

✨ Features

Translation Task: Specialized in translating legal text from German to English.
Based on t5-small: Built upon the t5-small model, with a smaller scale and about 60 million parameters.

📦 Installation

The code example above uses the transformers library. You can install it using the following command:

pip install transformers

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_de_en"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_de_en", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

de_text = "Eisenbahnunternehmen müssen Fahrkarten über mindestens einen der folgenden Vertriebswege anbieten: an Fahrkartenschaltern oder Fahrkartenautomaten, per Telefon, Internet oder jede andere in weitem Umfang verfügbare Informationstechnik oder in den Zügen."

pipeline([de_text], max_length=512)

Advanced Usage

In more complex scenarios, you can adjust parameters such as max_length according to your needs to achieve better translation results.

# Adjust max_length according to the length of the input text
pipeline([de_text], max_length=1024)

📚 Documentation

Model description

The legal_t5_small_trans_de_en model is based on the t5-small model and was trained on a large parallel text corpus. It is a smaller model that scales down the baseline model of T5 by using dmodel = 512, dff = 2,048, 8-headed attention, and only 6 layers each in the encoder and decoder. This variant has about 60 million parameters.

Intended uses & limitations

The model is intended for translating legal text from German to English.

Training data

The legal_t5_small_trans_de_en model was trained on JRC-ACQUIS, EUROPARL, and DCEP datasets, which consist of 5 million parallel texts.

Training procedure

The model was trained on a single TPU Pod V3-8 for a total of 250K steps, using a sequence length of 512 (batch size 4096). It has a total of approximately 220M parameters and was trained using the encoder-decoder architecture. The optimizer used is AdaFactor with an inverse square root learning rate schedule for pre-training.

Preprocessing

A unigram model was trained with 88M lines of text from the parallel corpus (of all possible language pairs) to obtain the vocabulary (with byte pair encoding), which is used with this model.

Pretraining

The model was pre-trained using the above methods to achieve good performance on the legal text translation task.

🔧 Technical Details

The model is based on the t5-small architecture, which is a sequence-to-sequence model. By adjusting the model's hyperparameters, such as dmodel, dff, the number of attention heads, and the number of layers, the model's scale is reduced to about 60 million parameters, making it more lightweight and suitable for some scenarios with limited resources.

📄 License

No license information is provided in the original document.

📊 Evaluation results

When the model is used for translation on the test dataset, it achieves the following results:

Model	BLEU score
legal_t5_small_trans_de_en	49.1

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | LinkedIn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご