Bibert-ende Open-Source Bilingual Pre-trained Language Model - Optimize English-German Translation and Improve Translation Performance

Bibert Ende

Developed by jhu-clsp

BiBERT-ende is a bilingual (English-German) pretrained language model optimized for Neural Machine Translation (NMT), enhancing translation performance through contextual embeddings.

Machine Translation

Transformers

Supports Multiple Languages#Bilingual Pretraining #Neural Machine Translation #Contextual Embeddings

Downloads 40

Release Time : 3/2/2022

Model Overview

BiBERT-ende is a customized bilingual pretrained language model designed to simplify the integration of existing pretrained models by directly feeding contextual embeddings into the NMT encoder, achieving state-of-the-art translation performance.

Model Features

Bilingual Pretraining

Specifically pretrained for English and German, optimizing cross-lingual contextual understanding.

Simplified Integration

Simplifies the integration of pretrained models by directly using contextual embeddings as NMT encoder input.

Random Layer Selection

Introduces random layer selection to ensure full utilization of different levels of contextual embeddings.

Bidirectional Translation Model

Supports bidirectional translation (English→German and German→English) with high performance in both directions.

Model Capabilities

English-to-German machine translation

German-to-English machine translation

Contextual embedding generation

Use Cases

Machine Translation

IWSLT'14 Dataset Translation

Achieves BLEU scores of 30.45 for English→German and 38.61 for German→English on the IWSLT'14 dataset.

Surpasses all published results

WMT'14 Dataset Translation

Achieves BLEU scores of 31.26 for English→German and 34.94 for German→English on the WMT'14 dataset.

Surpasses all published results

🚀 Bibert-ende: A Bilingual English-German Language Model

Our bibert-ende is a bilingual English-German Language Model. It offers a unique approach to neural machine translation. For more in - depth details, please refer to our EMNLP 2021 paper "BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation".

📚 Documentation

The bibert-ende model is based on a tailored bilingual pre - trained language model. The research in the paper shows that using the output (contextualized embeddings) of this model as the input of the NMT encoder can achieve state - of - the - art translation performance.

@inproceedings{xu-etal-2021-bert,
    title = "{BERT}, m{BERT}, or {B}i{BERT}? A Study on Contextualized Embeddings for Neural Machine Translation",
    author = "Xu, Haoran  and
      Van Durme, Benjamin  and
      Murray, Kenton",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.534",
    pages = "6663--6675",
    abstract = "The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation (NMT) systems. However, proposed methods for incorporating pre-trained models are non-trivial and mainly focus on BERT, which lacks a comparison of the impact that other pre-trained models may have on translation performance. In this paper, we demonstrate that simply using the output (contextualized embeddings) of a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) as the input of the NMT encoder achieves state-of-the-art translation performance. Moreover, we also propose a stochastic layer selection approach and a concept of a dual-directional translation model to ensure the sufficient utilization of contextualized embeddings. In the case of without using back translation, our best models achieve BLEU scores of 30.45 for En→De and 38.61 for De→En on the IWSLT{'}14 dataset, and 31.26 for En→De and 34.94 for De→En on the WMT{'}14 dataset, which exceeds all published numbers.",
}

📦 Installation

⚠️ Important Note

Note that tokenizer package is BertTokenizer not AutoTokenizer.

💻 Usage Examples

Basic Usage

from transformers import BertTokenizer, AutoModel
tokenizer = BertTokenizer.from_pretrained("jhu-clsp/bibert-ende")
model = AutoModel.from_pretrained("jhu-clsp/bibert-ende")

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご