🚀 t5-large針對RACE數據集微調以生成干擾項
本項目將t5-large模型針對RACE數據集進行微調,用於生成干擾項。輸入為問題、答案和上下文的組合,輸出為3個干擾項列表。該模型在問答生成流程中發揮重要作用,為相關研究和應用提供了有力支持。
🚀 快速開始
模型詳情
t5-large模型針對RACE數據集進行了微調。輸入是問題、答案和上下文的拼接,輸出是包含3個干擾項的列表。這是我們在 MQAG論文 中問答生成流程(即 g2
)的第二個組件。你也可以參考本項目的GitHub倉庫:https://github.com/potsawee/mqag0 。
如何使用模型
使用以下代碼開始使用該模型:
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-race-Distractor")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-race-Distractor")
>>> context = r"""
... World number one Novak Djokovic says he is hoping for a "positive decision" to allow him
... to play at Indian Wells and the Miami Open next month. The United States has extended
... its requirement for international visitors to be vaccinated against Covid-19. Proof of vaccination
... will be required to enter the country until at least 10 April, but the Serbian has previously
... said he is unvaccinated. The 35-year-old has applied for special permission to enter the country.
... Indian Wells and the Miami Open - two of the most prestigious tournaments on the tennis calendar
... outside the Grand Slams - start on 6 and 20 March respectively. Djokovic says he will return to
... the ATP tour in Dubai next week after claiming a record-extending 10th Australian Open title
... and a record-equalling 22nd Grand Slam men's title last month.""".replace("\n", "")
>>> question = "What is the best title for the passage?"
>>> answer = "Djokovic's application for special permission to enter the United States"
>>> input_text = " ".join([question, tokenizer.sep_token, answer, tokenizer.sep_token, context])
>>> inputs = tokenizer(input_text, return_tensors="pt")
>>> outputs = model.generate(**inputs, max_new_tokens=128)
>>> distractors = tokenizer.decode(outputs[0], skip_special_tokens=False)
>>> distractors = distractors.replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "")
>>> distractors = [y.strip() for y in distractors.split(tokenizer.sep_token)]
>>> print(distractors)
['The United States has extended its requirement for international visitors to be vaccinated against Covid-19',
"Djokovic's return to the ATP tour in Dubai",
"Djokovic's hope for a positive decision to allow him to play at Indian Wells and the Miami Open"]
引用
@article{manakul2023mqag,
title={MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization},
author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF},
journal={arXiv preprint arXiv:2301.12307},
year={2023}
}
📄 許可證
本項目採用Apache-2.0許可證。