legal_t5_small_trans_en_cs Open-source Legal Translation Model - Free Translation of Legal Documents from English to Czech

Legal T5 Small Trans En Cs

Developed by SEBIS

A legal text translation model optimized based on T5-small architecture, specializing in English to Czech legal document translation

Machine Translation #Legal text translation #English-Czech bilingual #Lightweight T5

Downloads 18

Release Time : 3/2/2022

Model Overview

This model is specifically designed for translating legal texts from English to Czech, trained on three parallel corpora: jrc-acquis, europarl, and dcep

Model Features

Legal domain specialization

Optimized training for legal terminology and sentence structures to ensure accurate translation of professional texts

Lightweight architecture

Maintains high performance while reducing computational resource requirements through streamlined model structure

Multi-source training data

Integrates three authoritative legal corpora: JRC-ACQUIS, EUROPARL, and DCEP

Model Capabilities

Legal text translation

Specialized terminology handling

Long sentence structure parsing

Use Cases

Legal document processing

EU legal document translation

Accurately translates EU legal documents from English to Czech

BLEU score of 50.177

Cross-border legal agreement processing

Assists in handling bilingual legal agreements involving English and Czech

🚀 legal_t5_small_trans_en_cs model

This is a model for translating legal text from English to Czech. It can effectively address the need for legal text translation between these two languages, providing a reliable solution for legal document processing. The model was first released in this repository and is trained on three parallel corpora from JRC-ACQUIS, Europarl, and DCEP.

✨ Features

Based on the t5-small model, it is trained on a large parallel text corpus.
A smaller model with about 60 million parameters, which scales down the baseline t5 model.
Can be used for translating legal texts from English to Czech.

📦 Installation

The README doesn't provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

Here is how to use this model to translate legal text from English to Czech in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_en_cs"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_en_cs", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

en_text = "1 In the countries concerned, this certainly affects the priority assigned to making progress on the issue of final disposal, particularly of highly radioactive waste and irradiated fuel elements."

pipeline([en_text], max_length=512)

📚 Documentation

Model description

The legal_t5_small_trans_en_cs model is based on the t5-small model and trained on a large corpus of parallel text. It is a smaller model that scales the baseline t5 model down by using dmodel = 512, dff = 2,048, 8 - headed attention, and only 6 layers each in the encoder and decoder. This variant has about 60 million parameters.

Intended uses & limitations

The model is designed for the translation of legal texts from English to Czech.

Training data

The legal_t5_small_trans_en_cs model was trained on [JRC - ACQUIS](https://wt - public.emm4u.eu/Acquis/index_2.2.html), EUROPARL, and [DCEP](https://ec.europa.eu/jrc/en/language - technologies/dcep) datasets, which consist of 5 million parallel texts.

Training procedure

Overall training: The model was trained on a single TPU Pod V3 - 8 for 250K steps in total, using a sequence length of 512 (batch size 4096). It has approximately 220M parameters and was trained using the encoder - decoder architecture.
Preprocessing: An unigram model was trained with 88M lines of text from the parallel corpus (of all possible language pairs) to get the vocabulary (with byte pair encoding), which is used with this model.

Evaluation results

When the model is used for the translation test dataset, it achieves the following results:

Model	BLEU score
legal_t5_small_trans_en_cs	50.177

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | [LinkedIn](https://www.linkedin.com/in/prof - ahmed - elnaggar/)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご