đ AlignScoreCS
A MultiTask multilingual model designed to assess factual consistency in context-claim pairs across diverse Natural Language Understanding (NLU) tasks.
This multi-task multilingual model, AlignScoreCS, is crafted to evaluate factual consistency in context-claim pairs across a wide array of Natural Language Understanding (NLU) tasks. These tasks include Summarization, Question Answering (QA), Semantic Textual Similarity (STS), Paraphrase, Fact Verification (FV), and Natural Language Inference (NLI).
AlignScoreCS is fine-tuned on a massive multi-task dataset that consists of 7 million documents, covering these NLU tasks in both Czech and English. Thanks to its multilingual pre-training, it holds the potential to be used in numerous other languages. The architecture can handle tasks through regression, binary classification, or ternary classification. However, for evaluation, we suggest using the AlignScore function.
This work draws inspiration from its English counterpart AlignScore: Evaluating Factual Consistency with a Unified Alignment Function. Nevertheless, during training, we used homogeneous batches instead of heterogeneous ones and employed three distinct architectures sharing a single encoder. This setup allows each architecture to be used independently with its classification head.
đ Quick Start
This section provides an overview of how to quickly get started with the AlignScoreCS model.
⨠Features
- Multilingual Capability: Can assess factual consistency in Czech and English, with potential for other languages.
- Multi - Task Adaptability: Suitable for various NLU tasks such as Summarization, QA, STS, etc.
- Flexible Classification: Supports regression, binary classification, and ternary classification.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from AlignScoreCS import AlignScoreCS
alignScoreCS = AlignScoreCS.from_pretrained("krotima1/AlignScoreCS")
print(alignScoreCS.score(context="This is context", claim="This is claim"))
đ Documentation
Evaluation
Similar to the paper AlignScore, we use their AlignScore function. This function chunks the context into approximately 350 tokens and splits the claim into sentences. Each context chunk is evaluated against each claim sentence, and a single consistency score is aggregated.
The AlignScoreCS model is built on three XLM - RoBERTa architectures that share one encoder. It is a multi - task multilingual model for assessing facticity in various NLU tasks in Czech and English. We followed the initial paper AlignScore https://arxiv.org/abs/2305.16739. We trained a model using a shared architecture of checkpoint xlm - roberta - large [xlm - roberta](https://huggingface.co/FacebookAI/xlm - roberta - large) with three linear layers for regression, binary classification, and ternary classification.
Training datasets
The following table presents the datasets used for training the model. We translated these English datasets to Czech using seamLessM4t.
Property |
Details |
Model Type |
MultiTask multilingual model |
Training Data |
The model is trained on a dataset of 7 million documents covering various NLU tasks in Czech and English. The specific datasets include SNLI, MultiNLI, Adversarial NLI, DocNLI, NLI - style FEVER, Vitamin C, QQP, PAWS, PAWS labeled, PAWS unlabeled, SICK, STS Benchmark, Free - N1, SQuAD v2, RACE, MS MARCO, WikiHow, SumAug. |
NLP Task |
Dataset |
Training Task |
Context (n words) |
Claim (n words) |
Sample Count |
NLI |
SNLI |
3 - way |
10 |
13 |
Cs: 500k |
|
|
|
|
|
En: 550k |
|
MultiNLI |
3 - way |
16 |
20 |
Cs: 393k |
|
|
|
|
|
En: 393k |
|
Adversarial NLI |
3 - way |
48 |
54 |
Cs: 163k |
|
|
|
|
|
En: 163k |
|
DocNLI |
2 - way |
97 |
285 |
Cs: 200k |
|
|
|
|
|
En: 942k |
Fact Verification |
NLI - style FEVER |
3 - way |
48 |
50 |
Cs: 208k |
|
|
|
|
|
En: 208k |
|
Vitamin C |
3 - way |
23 |
25 |
Cs: 371k |
|
|
|
|
|
En: 371k |
Paraphrase |
QQP |
2 - way |
9 |
11 |
Cs: 162k |
|
|
|
|
|
En: 364k |
|
PAWS |
2 - way |
- |
18 |
Cs: - |
|
|
|
|
|
En: 707k |
|
PAWS labeled |
2 - way |
18 |
- |
Cs: 49k |
|
|
|
|
|
En: - |
|
PAWS unlabeled |
2 - way |
18 |
- |
Cs: 487k |
|
|
|
|
|
En: - |
STS |
SICK |
reg |
- |
10 |
Cs: - |
|
|
|
|
|
En: 4k |
|
STS Benchmark |
reg |
- |
10 |
Cs: - |
|
|
|
|
|
En: 6k |
|
Free - N1 |
reg |
18 |
- |
Cs: 20k |
|
|
|
|
|
En: - |
QA |
SQuAD v2 |
2 - way |
105 |
119 |
Cs: 130k |
|
|
|
|
|
En: 130k |
|
RACE |
2 - way |
266 |
273 |
Cs: 200k |
|
|
|
|
|
En: 351k |
Information Retrieval |
MS MARCO |
2 - way |
49 |
56 |
Cs: 200k |
|
|
|
|
|
En: 5M |
Summarization |
WikiHow |
2 - way |
434 |
508 |
Cs: 157k |
|
|
|
|
|
En: 157k |
|
SumAug |
2 - way |
- |
- |
Cs: - |
|
|
|
|
|
En: - |
đ License
This work is licensed under CC - BY - 4.0.