Open-source model legal_t5_small_multitask_sv_it - Free deployment for translating Swedish legal texts into Italian

Legal T5 Small Multitask Sv It

Developed by SEBIS

This model is designed for translating legal texts from Swedish to Italian, incorporating multi-task learning scenarios including translation tasks and unsupervised tasks.

Machine Translation #Legal text translation #Swedish-Italian #Multi-task learning

Downloads 21

Release Time : 3/2/2022

Model Overview

This model specializes in Swedish-to-Italian legal text translation, achieving multi-task learning by combining training data from three parallel corpora (jrc-acquis, europarl, and dcep).

Model Features

Multi-task learning

The model combines translation tasks with unsupervised tasks (masked language model prediction), improving translation quality.

Multi-corpus training

Training data comes from three parallel corpora (jrc-acquis, europarl, and dcep), covering 42 language pairs.

Legal text optimization

Specially optimized for legal texts, suitable for legal document translation needs.

Model Capabilities

Swedish-to-Italian translation

Legal text processing

Use Cases

Legal document translation

EU legal document translation

Translating EU legal documents from Swedish to Italian

BLEU score 44.242

Regulatory agency document translation

Translating legal documents from national regulatory agencies

🚀 legal_t5_small_multitask_sv_it model

A model for translating legal text from Swedish to Italian, offering efficient and accurate legal language translation services.

🚀 Quick Start

The legal_t5_small_multitask_sv_it model is designed for translating legal text from Swedish to Italian. It was first released in this repository. The model is trained on three parallel corpora with 42 language pairs from JRC-ACQUIS, Europarl, and DCEP, along with an unsupervised task similar to a masked language model prediction task.

💻 Usage Examples

Basic Usage

Here is how to use this model to translate legal text from Swedish to Italian in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_multitask_sv_it"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_multitask_sv_it", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

sv_text = "De nationella tillsynsmyndigheterna får använda"

pipeline([sv_text], max_length=512)

✨ Features

Multitask Learning: No pretraining is involved. Instead, an unsupervised task is added to all translation tasks to achieve a multitask learning scenario.
Parallel Training: Trained on three parallel corpora with 42 language pairs from JRC-ACQUIS, Europarl, and DCEP.

📦 Installation

No specific installation steps are provided in the original document.

📚 Documentation

Model description

No pretraining is involved in case of legal_t5_small_multitask_sv_it model, rather the unsupervised task is added with all the translation task to realize the multitask learning scenario.

Intended uses & limitations

The model could be used for translation of legal texts from Swedish to Italian.

Training data

The legal_t5_small_multitask_sv_it model (the supervised task which involved only the corresponding language pair and as well as unsupervised task where all of the data of all language pairs were available) model was trained on JRC-ACQUIS, EUROPARL, and DCEP dataset consisting of 8 Million parallel texts.

Training procedure

The model was trained on a single TPU Pod V3-8 for 250K steps in total, using sequence length 512 (batch size 4096). It has a total of approximately 220M parameters and was trained using the encoder-decoder architecture. The optimizer used is AdaFactor with inverse square root learning rate schedule.

Preprocessing

An unigram model trained with 88M lines of text from the parallel corpus (of all possible language pairs) to get the vocabulary (with byte pair encoding), which is used with this model.

Evaluation results

When the model is used for translation test dataset, achieves the following results:

Model	BLEU score
legal_t5_small_multitask_sv_it	44.242

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | LinkedIn

🔧 Technical Details

The model is trained on a single TPU Pod V3-8 for 250K steps in total, using sequence length 512 (batch size 4096). It has a total of approximately 220M parameters and is trained using the encoder-decoder architecture. The optimizer used is AdaFactor with inverse square root learning rate schedule. An unigram model is trained with 88M lines of text from the parallel corpus to get the vocabulary (with byte pair encoding).

📄 License

No license information is provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご