🚀 t5-large针对RACE数据集微调以生成干扰项
本项目将t5-large模型针对RACE数据集进行微调,用于生成干扰项。输入为问题、答案和上下文的组合,输出为3个干扰项列表。该模型在问答生成流程中发挥重要作用,为相关研究和应用提供了有力支持。
🚀 快速开始
模型详情
t5-large模型针对RACE数据集进行了微调。输入是问题、答案和上下文的拼接,输出是包含3个干扰项的列表。这是我们在 MQAG论文 中问答生成流程(即 g2
)的第二个组件。你也可以参考本项目的GitHub仓库:https://github.com/potsawee/mqag0 。
如何使用模型
使用以下代码开始使用该模型:
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-race-Distractor")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-race-Distractor")
>>> context = r"""
... World number one Novak Djokovic says he is hoping for a "positive decision" to allow him
... to play at Indian Wells and the Miami Open next month. The United States has extended
... its requirement for international visitors to be vaccinated against Covid-19. Proof of vaccination
... will be required to enter the country until at least 10 April, but the Serbian has previously
... said he is unvaccinated. The 35-year-old has applied for special permission to enter the country.
... Indian Wells and the Miami Open - two of the most prestigious tournaments on the tennis calendar
... outside the Grand Slams - start on 6 and 20 March respectively. Djokovic says he will return to
... the ATP tour in Dubai next week after claiming a record-extending 10th Australian Open title
... and a record-equalling 22nd Grand Slam men's title last month.""".replace("\n", "")
>>> question = "What is the best title for the passage?"
>>> answer = "Djokovic's application for special permission to enter the United States"
>>> input_text = " ".join([question, tokenizer.sep_token, answer, tokenizer.sep_token, context])
>>> inputs = tokenizer(input_text, return_tensors="pt")
>>> outputs = model.generate(**inputs, max_new_tokens=128)
>>> distractors = tokenizer.decode(outputs[0], skip_special_tokens=False)
>>> distractors = distractors.replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "")
>>> distractors = [y.strip() for y in distractors.split(tokenizer.sep_token)]
>>> print(distractors)
['The United States has extended its requirement for international visitors to be vaccinated against Covid-19',
"Djokovic's return to the ATP tour in Dubai",
"Djokovic's hope for a positive decision to allow him to play at Indian Wells and the Miami Open"]
引用
@article{manakul2023mqag,
title={MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization},
author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF},
journal={arXiv preprint arXiv:2301.12307},
year={2023}
}
📄 许可证
本项目采用Apache-2.0许可证。