đ Quality Estimation for Machine Translation
This model offers a reference - free quality estimation (QE) solution for machine translation (MT) systems. It's a fine - tuned version of [FacebookAI/xlm - roberta - large](https://huggingface.co/FacebookAI/xlm - roberta - large) on the ymoslem/wmt - da - human - evaluation dataset, delivering valuable insights into MT quality.
đ Quick Start
Installation
- Install the required libraries.
pip3 install --upgrade datasets accelerate transformers
pip3 install --upgrade flash_attn triton
Inference Steps
- Load the test dataset.
from datasets import load_dataset
test_dataset = load_dataset("ymoslem/wmt-da-human-evaluation",
split="test",
trust_remote_code=True
)
print(test_dataset)
- Load the model and tokenizer:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_name = "ymoslem/ModernBERT-large-qe-v1"
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()
- Prepare the dataset. Each source segment
src
and target segment tgt
are separated by the sep_token
, which is '</s>'
for ModernBERT.
sep_token = tokenizer.sep_token
input_test_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(test_dataset["src"], test_dataset["mt"])]
- Generate predictions.
Basic Usage
If you print model.config.problem_type
, the output is regression
.
Still, you can use the "text - classification" pipeline as follows (cf. pipeline documentation):
from transformers import pipeline
classifier = pipeline("text-classification",
model=model_name,
tokenizer=tokenizer,
device=0,
)
predictions = classifier(input_test_texts,
batch_size=128,
truncation=True,
padding="max_length",
max_length=tokenizer.model_max_length,
)
predictions = [prediction["score"] for prediction in predictions]
Advanced Usage
Alternatively, you can use an elaborate version of the code, which is slightly faster and provides more control.
from torch.utils.data import DataLoader
import torch
from tqdm.auto import tqdm
def process_batch(batch, tokenizer, device):
sep_token = tokenizer.sep_token
input_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(batch["src"], batch["mt"])]
tokens = tokenizer(input_texts,
truncation=True,
padding="max_length",
max_length=tokenizer.model_max_length,
return_tensors="pt",
).to(device)
return tokens
test_dataloader = DataLoader(test_dataset,
batch_size=128,
shuffle=False)
predictions = []
with torch.no_grad():
for batch in tqdm(test_dataloader, desc="Inference Progress", unit="batch"):
tokens = process_batch(batch, tokenizer, device)
outputs = model(**tokens)
logits = outputs.logits
batch_predictions = logits.squeeze()
predictions.extend(batch_predictions.tolist())
⨠Features
- Multilingual Support: Supports multiple languages including bn, cs, de, en, et, fi, fr, gu, ha, hi, is, ja, kk, km, lt, lv, pl, ps, ru, ta, tr, uk, xh, zh, zu.
- Reference - Free QE: Enables quality estimation of machine translation systems without the need for reference translations.
đ Documentation
Model description
This model is for reference - free quality estimation (QE) of machine translation (MT) systems.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e - 05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 20000
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
0.0743 |
0.0502 |
1000 |
0.0598 |
0.0853 |
0.1004 |
2000 |
0.0745 |
0.0829 |
0.1506 |
3000 |
0.0726 |
0.0814 |
0.2008 |
4000 |
0.0872 |
0.0805 |
0.2509 |
5000 |
0.0715 |
0.0782 |
0.3011 |
6000 |
0.0819 |
0.0789 |
0.3513 |
7000 |
0.0733 |
0.0791 |
0.4015 |
8000 |
0.0748 |
0.0787 |
0.4517 |
9000 |
0.0759 |
0.0761 |
0.5019 |
10000 |
0.0725 |
0.0746 |
0.5521 |
11000 |
0.0745 |
0.0762 |
0.6023 |
12000 |
0.0750 |
0.077 |
0.6524 |
13000 |
0.0725 |
0.0777 |
0.7026 |
14000 |
0.0737 |
0.0764 |
0.7528 |
15000 |
0.0745 |
0.0781 |
0.8030 |
16000 |
0.0750 |
0.0748 |
0.8532 |
17000 |
0.0765 |
0.0768 |
0.9034 |
18000 |
0.0750 |
0.0737 |
0.9536 |
19000 |
0.0759 |
0.0769 |
1.0038 |
20000 |
0.0752 |
Framework versions
- Transformers 4.48.0
- Pytorch 2.4.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
đ§ Technical Details
This model is a fine - tuned version of [FacebookAI/xlm - roberta - large](https://huggingface.co/FacebookAI/xlm - roberta - large). It achieves the following results on the evaluation set:
- Loss: 0.0752
- Metrics:
- Pearson Correlation: 0.422
- Mean Absolute Error: 0.196
- Root Mean Squared Error: 0.245
- R - Squared: 0.245
đ License
This project is licensed under the MIT license.
đĻ Model Information
Property |
Details |
Model Type |
Quality Estimation for Machine Translation |
Base Model |
FacebookAI/xlm - roberta - large |
Training Data |
ymoslem/wmt - da - human - evaluation |
Metrics |
perplexity, mae, r_squared |
Tags |
quality - estimation, regression, generated_from_trainer |