đ ComVE-gpt2
A fine-tuned model on the Commonsense Validation and Explanation (ComVE) dataset, capable of generating reasons for statements against commonsense.
đ Quick Start
You can use this model directly to generate reasons why the given statement is against commonsense using the generate.sh
script.
â ī¸ Important Note
Make sure that you are using version 2.4.1
of the transformers
package. Newer versions have some issues in text generation, and the model may repeat the last token generated again and again.
⨠Features
- Finetuned on the Commonsense Validation and Explanation (ComVE) dataset using a causal language modeling (CLM) objective.
- Capable of generating a reason why a given natural language statement is against commonsense.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
You can use the raw model for text generation to generate reasons why natural language statements are against commonsense. For example, use the provided script to generate results:
bash generate.sh
đ Documentation
Model description
This is a fine-tuned model on the Commonsense Validation and Explanation (ComVE) dataset introduced in SemEval2020 Task4 using a causal language modeling (CLM) objective. The model can generate a reason why a given natural language statement is against commonsense.
Intended uses & limitations
- Intended uses: You can use the raw model for text generation to generate reasons why natural language statements are against commonsense.
- Limitations and bias: The model is biased to negate the entered sentence usually instead of producing a factual reason.
Training data
The model is initialized from the gpt2 model and fine-tuned using the ComVE dataset, which contains 10K against commonsense sentences, each paired with three reference reasons.
Training procedure
Each natural language statement that goes against commonsense is concatenated with its reference reason, using <|continue|>
as a separator. Then the model is fine-tuned using the CLM objective. The model is trained on an Nvidia Tesla P100 GPU from the Google Colab platform with a learning rate of 5e-5, 5 epochs, a maximum sequence length of 128, and a batch size of 64.
Eval results
The model achieved 14.0547/13.6534 BLEU scores on the SemEval2020 Task4: Commonsense Validation and Explanation development and testing dataset.
BibTeX entry and citation info
@article{fadel2020justers,
title={JUSTers at SemEval-2020 Task 4: Evaluating Transformer Models Against Commonsense Validation and Explanation},
author={Fadel, Ali and Al-Ayyoub, Mahmoud and Cambria, Erik},
year={2020}
}
đ§ Technical Details
The model is fine-tuned on the ComVE dataset with a CLM objective. It uses the gpt2 model as the base and is trained on specific hardware with defined hyperparameters.
đ License
This project is licensed under the MIT license.