Legal_t5_small_multitask_cs_it Open-source Model - Free Czech Legal Text to Italian Translation

Legal T5 Small Multitask Cs It

Developed by SEBIS

This model is designed for translating legal texts from Czech to Italian, incorporating multi-task learning scenarios including translation tasks and unsupervised masked language model prediction tasks.

Machine Translation #Legal text translation #Czech-Italian #Multi-task learning

Downloads 17

Release Time : 3/2/2022

Model Overview

This model is specialized for Czech-to-Italian legal text translation, trained through multi-task learning with data from three parallel corpora.

Model Features

Multi-task learning

The model simultaneously performs translation tasks and unsupervised masked language model prediction tasks, enhancing learning efficiency.

Legal domain specialization

Specially optimized for legal texts, making it suitable for legal document translation.

Multi-corpus training

Trained on three parallel corpora: JRC-ACQUIS, EUROPARL, and DCEP, ensuring diverse data sources.

Model Capabilities

Czech-to-Italian translation

Legal text processing

Multi-task learning

Use Cases

Legal document translation

EU legal document translation

Translating EU legal documents from Czech to Italian

BLEU score 45.297

Meeting document translation

Translating EU Council meeting documents

🚀 legal_t5_small_multitask_cs_it Model

A model designed for translating legal text from Czech to Italian, offering efficient and accurate legal language translation.

🚀 Quick Start

The legal_t5_small_multitask_cs_it model is dedicated to translating legal text from Czech to Italian. It was initially released in this repository. The model is trained in parallel on three parallel corpora with 42 language pairs from JRC-ACQUIS, Europarl, and DCEP, along with an unsupervised task where it follows the prediction task in a masked language model.

✨ Features

Multitask Learning: Combines unsupervised tasks with translation tasks to achieve multitask learning without pretraining.
Parallel Training: Trained on multiple parallel corpora to improve translation performance.

💻 Usage Examples

Basic Usage

Here is how to use this model to translate legal text from Czech to Italian in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
    model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_multitask_cs_it"),
    tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_multitask_cs_it", do_lower_case=False, 
                                                skip_special_tokens=True),
    device=0
)

cs_text = "Příprava Evropské rady (29.-30. října 2009)"

pipeline([cs_text], max_length=512)

📚 Documentation

Model Description

No pretraining is involved in the legal_t5_small_multitask_cs_it model. Instead, an unsupervised task is added to all translation tasks to realize the multitask learning scenario.

Intended Uses & Limitations

The model can be used for translating legal texts from Czech to Italian.

🔧 Technical Details

Training Data

The legal_t5_small_multitask_cs_it model (including the supervised task with only the corresponding language pair and the unsupervised task with data from all language pairs) was trained on the JRC-ACQUIS, EUROPARL, and DCEP datasets, which consist of 5 million parallel texts.

Training Procedure

Hardware: The model was trained on a single TPU Pod V3 - 8.
Steps: A total of 250K steps.
Sequence Length: 512 (batch size 4096).
Parameters: Approximately 220M parameters.
Architecture: Encoder - decoder architecture.
Optimizer: AdaFactor with an inverse square root learning rate schedule.

Preprocessing

A unigram model was trained with 88M lines of text from the parallel corpus (of all possible language pairs) to obtain the vocabulary (using byte - pair encoding), which is used with this model.

Evaluation Results

When the model is used for the translation test dataset, it achieves the following results:

Model	BLEU score
legal_t5_small_multitask_cs_it	45.297

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | LinkedIn

📄 License

No license information is provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご