🚀 t5-largeをRACEデータセットでファインチューニングした誘導選択肢生成モデル
このモデルは、入力として質問、回答、文脈を受け取り、3つの誘導選択肢を生成するように、t5-largeモデルをRACEデータセットでファインチューニングしたものです。
🚀 クイックスタート
入力形式
- 入力:
question <sep> answer <sep> context
出力形式
✨ 主な機能
t5-largeモデルをRACEデータセットでファインチューニングし、入力として質問、回答、文脈の連結を受け取り、3つの誘導選択肢のリストを出力します。これは、MQAG論文の質問生成パイプラインの2番目のコンポーネント(つまり g2
)です。また、このプロジェクトのGitHubリポジトリはこちらです: https://github.com/potsawee/mqag0。
📦 インストール
このモデルを使用するには、transformers
ライブラリが必要です。以下のコマンドでインストールできます。
pip install transformers
💻 使用例
基本的な使用法
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-race-Distractor")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-race-Distractor")
>>> context = r"""
... World number one Novak Djokovic says he is hoping for a "positive decision" to allow him
... to play at Indian Wells and the Miami Open next month. The United States has extended
... its requirement for international visitors to be vaccinated against Covid-19. Proof of vaccination
... will be required to enter the country until at least 10 April, but the Serbian has previously
... said he is unvaccinated. The 35-year-old has applied for special permission to enter the country.
... Indian Wells and the Miami Open - two of the most prestigious tournaments on the tennis calendar
... outside the Grand Slams - start on 6 and 20 March respectively. Djokovic says he will return to
... the ATP tour in Dubai next week after claiming a record-extending 10th Australian Open title
... and a record-equalling 22nd Grand Slam men's title last month.""".replace("\n", "")
>>> question = "What is the best title for the passage?"
>>> answer = "Djokovic's application for special permission to enter the United States"
>>> input_text = " ".join([question, tokenizer.sep_token, answer, tokenizer.sep_token, context])
>>> inputs = tokenizer(input_text, return_tensors="pt")
>>> outputs = model.generate(**inputs, max_new_tokens=128)
>>> distractors = tokenizer.decode(outputs[0], skip_special_tokens=False)
>>> distractors = distractors.replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "")
>>> distractors = [y.strip() for y in distractors.split(tokenizer.sep_token)]
>>> print(distractors)
['The United States has extended its requirement for international visitors to be vaccinated against Covid-19',
"Djokovic's return to the ATP tour in Dubai",
"Djokovic's hope for a positive decision to allow him to play at Indian Wells and the Miami Open"]
📚 ドキュメント
引用
@article{manakul2023mqag,
title={MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization},
author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF},
journal={arXiv preprint arXiv:2301.12307},
year={2023}
}
📄 ライセンス
このプロジェクトはApache-2.0ライセンスの下で公開されています。