đ Quality Estimation for Machine Translation
This model is designed for reference - free quality estimation of machine translation systems, offering valuable insights into translation quality.
đ Quick Start
This model is a fine - tuned version of answerdotai/ModernBERT-large on the ymoslem/wmt-da-human-evaluation dataset. It achieves the following results on the evaluation set:
⨠Features
This model is for reference - free quality estimation (QE) of machine translation (MT) systems.
đĻ Installation
- Install the required libraries.
pip3 install --upgrade datasets accelerate transformers
pip3 install --upgrade flash_attn triton
đģ Usage Examples
Basic Usage
- Load the test dataset.
from datasets import load_dataset
test_dataset = load_dataset("ymoslem/wmt-da-human-evaluation",
split="test",
trust_remote_code=True
)
print(test_dataset)
- Load the model and tokenizer:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_name = "ymoslem/ModernBERT-large-qe-maxlen512-v1"
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()
- Prepare the dataset. Each source segment
src
and target segment tgt
are separated by the sep_token
, which is '</s>'
for ModernBERT.
sep_token = tokenizer.sep_token
input_test_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(test_dataset["src"], test_dataset["mt"])]
- Generate predictions.
from transformers import pipeline
classifier = pipeline("text-classification",
model=model_name,
tokenizer=tokenizer,
device=0,
)
predictions = classifier(input_test_texts,
batch_size=128,
truncation=True,
padding="max_length",
max_length=tokenizer.model_max_length,
)
predictions = [prediction["score"] for prediction in predictions]
Advanced Usage
from torch.utils.data import DataLoader
import torch
from tqdm.auto import tqdm
def process_batch(batch, tokenizer, device):
sep_token = tokenizer.sep_token
input_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(batch["src"], batch["mt"])]
tokens = tokenizer(input_texts,
truncation=True,
padding="max_length",
max_length=tokenizer.model_max_length,
return_tensors="pt",
).to(device)
return tokens
test_dataloader = DataLoader(test_dataset,
batch_size=128,
shuffle=False)
predictions = []
with torch.no_grad():
for batch in tqdm(test_dataloader, desc="Inference Progress", unit="batch"):
tokens = process_batch(batch, tokenizer, device)
outputs = model(**tokens)
logits = outputs.logits
batch_predictions = logits.squeeze()
predictions.extend(batch_predictions.tolist())
đ Documentation
Training procedure
Training hyperparameters
This version of the model uses tokenizer.model_max_length = 512
. The model with full length of 8192 can be found here ymoslem/ModernBERT-large-qe-v1
The following hyperparameters were used during training:
- learning_rate: 8e - 05
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 10000
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
0.0631 |
0.1004 |
1000 |
0.0674 |
0.0614 |
0.2007 |
2000 |
0.0599 |
0.0578 |
0.3011 |
3000 |
0.0585 |
0.0585 |
0.4015 |
4000 |
0.0579 |
0.0568 |
0.5019 |
5000 |
0.0570 |
0.057 |
0.6022 |
6000 |
0.0568 |
0.0579 |
0.7026 |
7000 |
0.0567 |
0.0573 |
0.8030 |
8000 |
0.0565 |
0.0568 |
0.9033 |
9000 |
0.0564 |
0.0571 |
1.0037 |
10000 |
0.0564 |
Framework versions
- Transformers 4.48.0
- Pytorch 2.4.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
Model Information
Property |
Details |
Library Name |
transformers |
Supported Languages |
multilingual, bn, cs, de, en, et, fi, fr, gu, ha, hi, is, ja, kk, km, lt, lv, pl, ps, ru, ta, tr, uk, xh, zh, zu |
License |
apache - 2.0 |
Base Model |
answerdotai/ModernBERT - large |
Tags |
quality - estimation, regression, generated_from_trainer |
Datasets |
ymoslem/wmt - da - human - evaluation |
New Version |
ymoslem/ModernBERT - large - qe - v1 |
Model Performance
The model has the following performance metrics on the ymoslem/wmt-da-human-evaluation
dataset:
Metric Name |
Metric Type |
Value |
Pearson Correlation |
Pearson |
0.4589 |
Mean Absolute Error |
MAE |
0.1861 |
Root Mean Squared Error |
RMSE |
0.2375 |
R - Squared |
R2 |
0.2106 |
đ License
This project is licensed under the apache - 2.0 license.