🚀 t5端到端問題生成
該模型是 t5-base 在squad數據集上的微調版本,用於根據上下文生成問題。它能幫助用戶基於給定的文本內容自動生成相關問題,提升信息獲取和處理的效率。
👉 如果你想了解如何微調t5模型以實現相同功能,可以參考這個 教程
示例
Context: "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace."
Questions:
Who created Python?,
When was Python first released?
What is Python's design philosophy?
評估集結果
該模型在評估集上取得了以下結果:
🚀 快速開始
模型使用
from transformers import T5ForConditionalGeneration, T5TokenizerFast
hfmodel = T5ForConditionalGeneration.from_pretrained("ThomasSimonini/t5-end2end-question-generation")
text= "The abolition of feudal privileges by the National Constituent Assembly on 4 August 1789 and the Declaration \\nof the Rights of Man and of the Citizen (La Déclaration des Droits de l'Homme et du Citoyen), drafted by Lafayette \\nwith the help of Thomas Jefferson and adopted on 26 August, paved the way to a Constitutional Monarchy \\n(4 September 1791 – 21 September 1792). Despite these dramatic changes, life at the court continued, while the situation \\nin Paris was becoming critical because of bread shortages in September. On 5 October 1789, a crowd from Paris descended upon Versailles \\nand forced the royal family to move to the Tuileries Palace in Paris, where they lived under a form of house arrest under \\nthe watch of Lafayette's Garde Nationale, while the Comte de Provence and his wife were allowed to reside in the \\nPetit Luxembourg, where they remained until they went into exile on 20 June 1791."
def run_model(input_string, **generator_args):
generator_args = {
"max_length": 256,
"num_beams": 4,
"length_penalty": 1.5,
"no_repeat_ngram_size": 3,
"early_stopping": True,
}
input_string = "generate questions: " + input_string + " </s>"
input_ids = tokenizer.encode(input_string, return_tensors="pt")
res = hfmodel.generate(input_ids, **generator_args)
output = tokenizer.batch_decode(res, skip_special_tokens=True)
output = [item.split("<sep>") for item in output]
return output
run_model(text)
=> [['When did the National Constituent Assembly abolish feudal privileges?',
' Who drafted the Declaration of the Rights of Man and of the Citizen?',
' When was the Constitutional Monarchy established?',
' What was the name of the Declaration that paved the way to a constitutional monarchy?',
'']]
訓練超參數
以下是訓練過程中使用的超參數:
屬性 |
詳情 |
學習率 |
0.0001 |
訓練批次大小 |
4 |
評估批次大小 |
4 |
隨機種子 |
42 |
梯度累積步數 |
16 |
總訓練批次大小 |
64 |
優化器 |
Adam(beta=(0.9, 0.999),epsilon=1e-08) |
學習率調度器類型 |
線性 |
訓練輪數 |
7 |
訓練結果
訓練損失 |
輪數 |
步數 |
驗證損失 |
2.5834 |
0.34 |
100 |
1.9107 |
1.9642 |
0.68 |
200 |
1.7227 |
1.8526 |
1.02 |
300 |
1.6627 |
1.7383 |
1.36 |
400 |
1.6354 |
1.7223 |
1.69 |
500 |
1.6154 |
1.6871 |
2.03 |
600 |
1.6096 |
1.6309 |
2.37 |
700 |
1.6048 |
1.6242 |
2.71 |
800 |
1.5923 |
1.6226 |
3.05 |
900 |
1.5855 |
1.5645 |
3.39 |
1000 |
1.5874 |
1.5705 |
3.73 |
1100 |
1.5822 |
1.5543 |
4.07 |
1200 |
1.5817 |
1.5284 |
4.41 |
1300 |
1.5841 |
1.5275 |
4.75 |
1400 |
1.5741 |
1.5269 |
5.08 |
1500 |
1.5715 |
1.5079 |
5.42 |
1600 |
1.5701 |
1.4876 |
5.76 |
1700 |
1.5754 |
1.498 |
6.1 |
1800 |
1.5699 |
1.4852 |
6.44 |
1900 |
1.5693 |
1.4776 |
6.78 |
2000 |
1.5691 |
框架版本
- Transformers 4.10.3
- Pytorch 1.9.0+cu102
- Datasets 1.12.1
- Tokenizers 0.10.3
📄 許可證
本項目採用Apache-2.0許可證。