s2t-wav2vec2-large-en-ar Open Source Model - Free English to Arabic Speech-to-Text Translation

S2t Wav2vec2 Large En Ar

Developed by facebook

Transformer-based end-to-end speech translation model supporting English to Arabic speech-to-text translation

Supports Multiple LanguagesOpen Source License:MIT #English-Arabic Speech Translation #End-to-End Speech-to-Text #Transformer-based

Downloads 62

Release Time : 3/2/2022

Model Overview

This model is a speech-to-text Transformer specifically trained for end-to-end speech translation (ST), using a pretrained Wav2Vec2 as the encoder paired with a Transformer decoder, capable of translating English speech to Arabic text.

Model Features

End-to-End Speech Translation

Directly translates English speech to Arabic text without intermediate transcription steps.

Wav2Vec2 Pretraining

Utilizes the powerful Wav2Vec2 speech encoder to enhance model performance.

Multilingual Support

Supports speech translation between English and Arabic.

Model Capabilities

Speech-to-Text

English-to-Arabic Translation

Automatic Speech Recognition

Use Cases

Speech Translation

Real-time Speech Translation

Translates English speech to Arabic text in real-time.

Meeting Transcript Translation

Automatically translates English meeting recordings into Arabic text transcripts.

🚀 S2T2-Wav2Vec2-CoVoST2-EN-AR-ST

s2t-wav2vec2-large-en-ar is a Speech to Text Transformer model trained for end-to-end Speech Translation (ST). It offers an efficient solution for translating English speech directly into Arabic text.

🚀 Quick Start

This model can be used for end-to-end English speech to Arabic text translation. You can use the model directly via the ASR pipeline or step - by - step as shown in the usage examples below.

✨ Features

End - to - End Translation: Capable of performing direct English speech to Arabic text translation.
Transformer - Based: Utilizes a transformer - based seq2seq (speech encoder - decoder) architecture.
Pretrained Encoder: Employs a pretrained Wav2Vec2 as the encoder.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from datasets import load_dataset
from transformers import pipeline

librispeech_en = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
asr = pipeline("automatic-speech-recognition", model="facebook/s2t-wav2vec2-large-en-ar", feature_extractor="facebook/s2t-wav2vec2-large-en-ar")

translation = asr(librispeech_en[0]["file"])

Advanced Usage

import torch
from transformers import Speech2Text2Processor, SpeechEncoderDecoder
from datasets import load_dataset

import soundfile as sf
model = SpeechEncoderDecoder.from_pretrained("facebook/s2t-wav2vec2-large-en-ar")
processor = Speech2Text2Processor.from_pretrained("facebook/s2t-wav2vec2-large-en-ar")

def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch
    
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

inputs = processor(ds["speech"][0], sampling_rate=16_000, return_tensors="pt")
generated_ids = model.generate(input_ids=inputs["input_features"], attention_mask=inputs["attention_mask"])
transcription = processor.batch_decode(generated_ids)

📚 Documentation

Model description

S2T2 is a transformer - based seq2seq (speech encoder - decoder) model designed for end - to - end Automatic Speech Recognition (ASR) and Speech Translation (ST). It uses a pretrained Wav2Vec2 as the encoder and a transformer - based decoder. The model is trained with standard autoregressive cross - entropy loss and generates the translations autoregressively.

Intended uses & limitations

This model can be used for end - to - end English speech to Arabic text translation. See the model hub to look for other S2T2 checkpoints.

How to use

As this is a standard sequence to sequence transformer model, you can use the generate method to generate the transcripts by passing the speech features to the model.

🔧 Technical Details

The S2T2 model was proposed in Large - Scale Self - and Semi - Supervised Learning for Speech Translation and officially released in Fairseq.

📄 License

This model is licensed under the MIT license.

Additional Information

Evaluation results

CoVoST - V2 test results for en - ar (BLEU score): 20.2 For more information, please have a look at the official paper - especially row 10 of Table 2.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2104-06678,
  author    = {Changhan Wang and
               Anne Wu and
               Juan Miguel Pino and
               Alexei Baevski and
               Michael Auli and
               Alexis Conneau},
  title     = {Large-Scale Self- and Semi-Supervised Learning for Speech Translation},
  journal   = {CoRR},
  volume    = {abs/2104.06678},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.06678},
  archivePrefix = {arXiv},
  eprint    = {2104.06678},
  timestamp = {Thu, 12 Aug 2021 15:37:06 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-06678.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Information Table

Property	Details
Model Type	Speech to Text Transformer for end - to - end Speech Translation
Training Data	covost2, librispeech_asr
Tags	audio, speech - translation, automatic - speech - recognition, speech2text2
Pipeline Tag	automatic - speech - recognition

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご