AlignScoreCS Open-Source Model - Freely Evaluate the Fact Consistency between Context and Claim in Czech and English Texts

Home

Alignscorecs

Developed by krotima1

A multi-task multilingual model for evaluating the fact consistency of context-claim pairs in Czech and English texts

Large Language Model

Transformers

Supports Multiple Languages#Multilingual fact consistency evaluation #Cross-lingual NLU tasks #Context-claim alignment

Downloads 58

Release Time : 4/8/2024

Model Overview

This model is based on the XLM-RoBERTa architecture and is specifically designed to evaluate fact consistency in natural language understanding tasks. It supports various tasks such as summary generation, question-answering systems, semantic text similarity, paraphrasing, fact-checking, and natural language inference.

Model Features

Multilingual support

Supports fact consistency evaluation in English and Czech, with potential for cross-lingual applications

Multi-task architecture

Adopts a design of a shared encoder with three independent classification heads, capable of handling regression, binary classification, and ternary classification tasks simultaneously

Large-scale training data

Fine-tuned based on a multi-task dataset containing 7 million documents, covering various NLU tasks

Chunking evaluation strategy

Adopts an innovative chunking processing method, splitting long texts into segments for scoring and then aggregating the results

Model Capabilities

Fact consistency scoring

Cross-lingual text evaluation

Multi-task processing

Natural language understanding

Use Cases

Text summary evaluation

Summary factuality check

Evaluate the fact consistency between the generated summary and the original text content

Quantify the accuracy of the summary

Question-answering system

Answer verification

Verify whether the answer generated by the system is consistent with the given context

Improve the reliability of the question-answering system

Fact-checking

Statement verification

Evaluate the consistency between the statement and the supporting evidence

Assist in the fact-checking process

🚀 AlignScoreCS

A MultiTask multilingual model designed to assess factual consistency in context-claim pairs across diverse Natural Language Understanding (NLU) tasks.

This multi-task multilingual model, AlignScoreCS, is crafted to evaluate factual consistency in context-claim pairs across a wide array of Natural Language Understanding (NLU) tasks. These tasks include Summarization, Question Answering (QA), Semantic Textual Similarity (STS), Paraphrase, Fact Verification (FV), and Natural Language Inference (NLI).

AlignScoreCS is fine-tuned on a massive multi-task dataset that consists of 7 million documents, covering these NLU tasks in both Czech and English. Thanks to its multilingual pre-training, it holds the potential to be used in numerous other languages. The architecture can handle tasks through regression, binary classification, or ternary classification. However, for evaluation, we suggest using the AlignScore function.

This work draws inspiration from its English counterpart AlignScore: Evaluating Factual Consistency with a Unified Alignment Function. Nevertheless, during training, we used homogeneous batches instead of heterogeneous ones and employed three distinct architectures sharing a single encoder. This setup allows each architecture to be used independently with its classification head.

🚀 Quick Start

This section provides an overview of how to quickly get started with the AlignScoreCS model.

✨ Features

Multilingual Capability: Can assess factual consistency in Czech and English, with potential for other languages.
Multi - Task Adaptability: Suitable for various NLU tasks such as Summarization, QA, STS, etc.
Flexible Classification: Supports regression, binary classification, and ternary classification.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

  # Assuming you copied the attached Files_and_versions/AlignScore.py file for ease of use in transformers.
  from AlignScoreCS import AlignScoreCS
  alignScoreCS = AlignScoreCS.from_pretrained("krotima1/AlignScoreCS")
  # put the model to cuda to accelerate
  print(alignScoreCS.score(context="This is context", claim="This is claim"))

📚 Documentation

Evaluation

Similar to the paper AlignScore, we use their AlignScore function. This function chunks the context into approximately 350 tokens and splits the claim into sentences. Each context chunk is evaluated against each claim sentence, and a single consistency score is aggregated.

The AlignScoreCS model is built on three XLM - RoBERTa architectures that share one encoder. It is a multi - task multilingual model for assessing facticity in various NLU tasks in Czech and English. We followed the initial paper AlignScore https://arxiv.org/abs/2305.16739. We trained a model using a shared architecture of checkpoint xlm - roberta - large [xlm - roberta](https://huggingface.co/FacebookAI/xlm - roberta - large) with three linear layers for regression, binary classification, and ternary classification.

Training datasets

The following table presents the datasets used for training the model. We translated these English datasets to Czech using seamLessM4t.

Property	Details
Model Type	MultiTask multilingual model
Training Data	The model is trained on a dataset of 7 million documents covering various NLU tasks in Czech and English. The specific datasets include SNLI, MultiNLI, Adversarial NLI, DocNLI, NLI - style FEVER, Vitamin C, QQP, PAWS, PAWS labeled, PAWS unlabeled, SICK, STS Benchmark, Free - N1, SQuAD v2, RACE, MS MARCO, WikiHow, SumAug.

NLP Task	Dataset	Training Task	Context (n words)	Claim (n words)	Sample Count
NLI	SNLI	3 - way	10	13	Cs: 500k
					En: 550k
	MultiNLI	3 - way	16	20	Cs: 393k
					En: 393k
	Adversarial NLI	3 - way	48	54	Cs: 163k
					En: 163k
	DocNLI	2 - way	97	285	Cs: 200k
					En: 942k
Fact Verification	NLI - style FEVER	3 - way	48	50	Cs: 208k
					En: 208k
	Vitamin C	3 - way	23	25	Cs: 371k
					En: 371k
Paraphrase	QQP	2 - way	9	11	Cs: 162k
					En: 364k
	PAWS	2 - way	-	18	Cs: -
					En: 707k
	PAWS labeled	2 - way	18	-	Cs: 49k
					En: -
	PAWS unlabeled	2 - way	18	-	Cs: 487k
					En: -
STS	SICK	reg	-	10	Cs: -
					En: 4k
	STS Benchmark	reg	-	10	Cs: -
					En: 6k
	Free - N1	reg	18	-	Cs: 20k
					En: -
QA	SQuAD v2	2 - way	105	119	Cs: 130k
					En: 130k
	RACE	2 - way	266	273	Cs: 200k
					En: 351k
Information Retrieval	MS MARCO	2 - way	49	56	Cs: 200k
					En: 5M
Summarization	WikiHow	2 - way	434	508	Cs: 157k
					En: 157k
	SumAug	2 - way	-	-	Cs: -
					En: -

📄 License

This work is licensed under CC - BY - 4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご