legal_t5_small_trans_sv_en Open Source Model - Free and Efficient English Translation of Swedish Legal Texts

Legal T5 Small Trans Sv En

Developed by SEBIS

Lightweight T5 model specialized for translating legal texts from Swedish to English, trained on three parallel corpora

Machine Translation #Legal text translation #Swedish-English #Lightweight T5

Downloads 21

Release Time : 3/2/2022

Model Overview

This model is specifically designed for translating legal texts from Swedish to English, trained on three major parallel corpora: jrc-acquis, europarl, and dcep, using the lightweight T5-small architecture.

Model Features

Legal domain specialization

Optimized for legal text translation, ensuring accurate handling of legal terminology

Lightweight design

Utilizes the T5-small architecture with around 60 million parameters, requiring lower computational resources

Multi-source training data

Integrates three authoritative legal corpora: JRC-ACQUIS, EUROPARL, and DCEP

Model Capabilities

Legal text translation

Swedish to English translation

Specialized terminology handling

Use Cases

Legal document translation

EU legal document translation

Translate EU legal documents from Swedish to English

Maintains accuracy in legal terminology and sentence structure

Judicial procedure document processing

Translate legal litigation-related documents and investigative files

Ensures precise conveyance of procedural expressions

🚀 legal_t5_small_trans_sv_en Model

A model designed for translating legal text from Swedish to English, offering efficient and accurate translation services.

🚀 Quick Start

The legal_t5_small_trans_sv_en model is dedicated to translating legal text from Swedish to English. It was initially released in this repository and trained on three parallel corpora from jrc - acquis, europarl, and dcep.

✨ Features

Based on the t5 - small model, it is trained on a large parallel text corpus.
A smaller - scale model with about 60 million parameters, achieved by scaling down the baseline t5 model.
Capable of translating legal texts from Swedish to English.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to translate legal text from Swedish to English in PyTorch:

from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_sv_en"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_sv_en", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

sv_text = "Om rättsliga förfaranden inleds rörande omständigheter som ombudsmannen utreder skall han avsluta ärendet."

pipeline([sv_text], max_length=512)

📚 Documentation

Model Description

The legal_t5_small_trans_sv_en model is based on the t5 - small model and trained on a large parallel text corpus. It is a smaller model that scales down the baseline t5 model. It uses dmodel = 512, dff = 2,048, 8 - headed attention, and only 6 layers each in the encoder and decoder. This variant has about 60 million parameters.

Intended Uses & Limitations

The model can be used for translating legal texts from Swedish to English.

🔧 Technical Details

Training Data

The legal_t5_small_trans_sv_en model was trained on [JRC - ACQUIS](https://wt - public.emm4u.eu/Acquis/index_2.2.html), EUROPARL, and [DCEP](https://ec.europa.eu/jrc/en/language - technologies/dcep) datasets, which consist of 5 Million parallel texts.

Training Procedure

The model was trained on a single TPU Pod V3 - 8 for 250K steps in total, using a sequence length of 512 (batch size 4096). It has approximately 220M parameters in total and was trained using the encoder - decoder architecture. The optimizer used is AdaFactor with an inverse square root learning rate schedule for pre - training.

Preprocessing

An unigram model was trained with 88M lines of text from the parallel corpus (of all possible language pairs) to obtain the vocabulary (with byte pair encoding), which is used with this model.

Evaluation Results

When the model is used on the translation test dataset, it achieves the following results:

Model	BLEU score
legal_t5_small_trans_sv_en	52.025

📄 License

No license information is provided in the original document.

BibTeX entry and citation info

Created by Ahmed Elnaggar/@Elnaggar_AI | [LinkedIn](https://www.linkedin.com/in/prof - ahmed - elnaggar/)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご