Legal_t5_small_trans_cs_sv Open Source Model - Free Deployment to Achieve Translation from Czech Legal Texts to Swedish

Legal T5 Small Trans Cs Sv

Developed by SEBIS

Small T5 model for translating legal texts from Czech to Swedish

Machine Translation #Legal text translation #Czech-Swedish #Small parameter optimization

Downloads 17

Release Time : 3/2/2022

Model Overview

This model is specifically designed for translating legal texts from Czech to Swedish, trained on three parallel corpora: jrc-acquis, europarl, and dcep.

Model Features

Legal domain optimization

Specially trained for legal text translation, capable of handling professional legal terminology

Efficient architecture

Lightweight architecture based on T5-small with approximately 60 million parameters, suitable for resource-limited environments

Multi-dataset training

Combined training on three authoritative legal corpora: JRC-ACQUIS, EUROPARL, and DCEP

Model Capabilities

Legal text translation

Professional terminology processing

Czech to Swedish conversion

Use Cases

Legal document translation

EU legal document translation

Translating EU legal documents from Czech to Swedish

BLEU score 47.9

Legal training material translation

Translating professional legal training course materials

🚀 legal_t5_small_trans_cs_sv Model

A model designed for translating legal text from Czech to Swedish, offering efficient and accurate legal language conversion.

🚀 Quick Start

The legal_t5_small_trans_cs_sv model is dedicated to translating legal text from Czech to Swedish. It was initially released in this repository and trained on three parallel corpora from JRC-ACQUIS, Europarl, and DCEP.

✨ Features

Based on the t5-small model, trained on a large parallel text corpus.
A smaller model with about 60 million parameters, scaling down the baseline t5 model.
Suitable for translating legal texts from Czech to Swedish.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Here is how to use this model to translate legal text from Czech to Swedish in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_cs_sv"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_cs_sv", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

cs_text = "Odborná příprava je v sektoru minimální a tradiční, postrádá specifické kurzy nebo výukové plány."

pipeline([cs_text], max_length=512)

📚 Documentation

Model Description

The legal_t5_small_trans_cs_sv model is based on the t5-small model and trained on a large parallel text corpus. It is a smaller model that scales down the baseline t5 model by using dmodel = 512, dff = 2,048, 8 - headed attention, and only 6 layers each in the encoder and decoder. This variant has about 60 million parameters.

Intended Uses & Limitations

The model can be used for translating legal texts from Czech to Swedish.

🔧 Technical Details

Training Data

The legal_t5_small_trans_cs_sv model was trained on [JRC - ACQUIS](https://wt - public.emm4u.eu/Acquis/index_2.2.html), EUROPARL, and [DCEP](https://ec.europa.eu/jrc/en/language - technologies/dcep) datasets, which consist of 5 million parallel texts.

Training Procedure

The model was trained on a single TPU Pod V3 - 8 for a total of 250K steps, using a sequence length of 512 (batch size 4096). It has approximately 220M parameters in total and was trained using the encoder - decoder architecture. The optimizer used is AdaFactor with an inverse square root learning rate schedule for pre - training.

Preprocessing

A unigram model was trained with 88M lines of text from the parallel corpus (of all possible language pairs) to obtain the vocabulary (with byte pair encoding), which is used with this model.

Evaluation Results

When the model is used for the translation test dataset, it achieves the following results:

Model	BLEU score
legal_t5_small_trans_cs_sv	47.9

📄 License

No license information is provided in the original document, so this section is skipped.

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | LinkedIn

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご