đ MeaningBERT
MeaningBERT is an automatic and trainable metric designed to assess the meaning preservation between sentences. It aims to provide evaluations that highly correlate with human judgments and pass sanity checks. For more details, refer to our publicly available article.
đ Quick Start
MeaningBERT can be used in multiple ways. You can either use it as a model for retraining or inference, or as a metric for evaluation.
Use as a Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT")
model = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT")
Use as a Metric
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT")
scorer = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT")
scorer.eval()
documents = ["He wanted to make them pay.", "This sandwich looks delicious.", "He wants to eat."]
simplifications = ["He wanted to make them pay.", "This sandwich looks delicious.",
"Whatever, whenever, this is a sentence."]
tokenize_text = tokenizer(documents, simplifications, truncation=True, padding=True, return_tensors="pt")
with torch.no_grad():
scores = scorer(**tokenize_text)
print(scores.logits.tolist())
Use HuggingFace Metric Module
import evaluate
documents = ["He wanted to make them pay.", "This sandwich looks delicious.", "He wants to eat."]
simplifications = ["He wanted to make them pay.", "This sandwich looks delicious.",
"Whatever, whenever, this is a sentence."]
meaning_bert = evaluate.load("davebulaval/meaningbert")
print(meaning_bert.compute(references=documents, predictions=simplifications))
⨠Features
- Objective Evaluation: MeaningBERT provides an objective way to assess meaning preservation between sentences, reducing the subjectivity associated with human judgment.
- Automated Sanity Checks: It includes two automated tests to ensure the metric's reliability and performance.
- Flexible Usage: Can be used as a model for retraining or inference, or as a metric for evaluation.
đ§ Technical Details
Sanity Check
Correlation to human judgment is a common way to evaluate meaning preservation metrics. However, it is subjective and expensive. As an alternative, we designed two automated tests:
Identical Sentences
This test evaluates meaning preservation between identical sentences. We calculate the ratio of times the metric rating is greater or equal to a threshold value Xâ[95, 99] to the total number of sentences. To account for computer floating - point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of 100%.
Unrelated Sentences
This test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large language model. We check that the metric rating is lower or equal to a threshold value Xâ[5, 1]. Again, we round the ratings to the nearest integer to account for floating - point inaccuracy and do not use a threshold value of 0%.
đ Documentation
đ License
MeaningBERT is MIT licensed, as found in the LICENSE file.
đ¤ Contributing
We welcome user input, whether it regards bugs found in the library or feature propositions! Make sure to have a look at our contributing guidelines for more details on this matter.
đ Cite
Use the following citation to cite MeaningBERT:
@ARTICLE{10.3389/frai.2023.1223924,
AUTHOR={Beauchemin, David and Saggion, Horacio and Khoury, Richard},
TITLE={MeaningBERT: assessing meaning preservation between sentences},
JOURNAL={Frontiers in Artificial Intelligence},
VOLUME={6},
YEAR={2023},
URL={https://www.frontiersin.org/articles/10.3389/frai.2023.1223924},
DOI={10.3389/frai.2023.1223924},
ISSN={2624-8212},
}