開源MixQG-3B模型 - 根據上下文和答案免費生成相關問題！

首頁

Mixqg 3b

由Salesforce開發

MixQG是一種基於混合答案類型的神經問題生成模型，能夠根據給定的上下文和答案生成相關問題。

問答系統

Transformers

英語#問答生成 #混合答案類型 #學術研究輔助

下載量 448

發布時間 : 3/2/2022

模型概述

MixQG是一種新型問題生成模型，基於混合答案類型的問答數據集進行預訓練，主要用於根據上下文和答案生成相關問題。

模型特點

混合答案類型支持

能夠處理多種答案類型的問題生成任務

大規模預訓練

基於30億參數規模的模型進行訓練

中文支持

特別針對中文問題生成進行了優化

模型能力

文本生成

問題生成

自然語言處理

使用案例

教育

自動生成測試問題

根據教材內容自動生成相關問題用於測試

提高教師工作效率，快速創建測試題目

問答系統

增強問答系統

為問答系統生成相關問題以豐富知識庫

提升問答系統的覆蓋範圍和用戶體驗

🚀 MixQG (3b規模模型)

MixQG是一個新的問題生成模型，它在包含多種答案類型的問答數據集集合上進行了預訓練。該模型在論文MixQG: Neural Question Generation with Mixed Answer Types中被提出，相關代碼發佈在此倉庫中。

🚀 快速開始

如何使用

基礎用法

使用Huggingface的pipeline抽象：

from transformers import pipeline

nlp = pipeline("text2text-generation", model='Salesforce/mixqg-3b', tokenizer='Salesforce/mixqg-3b')
    
CONTEXT = "In the late 17th century, Robert Boyle proved that air is necessary for combustion."
ANSWER = "Robert Boyle"

def format_inputs(context: str, answer: str):
    return f"{answer} \\n {context}"

text = format_inputs(CONTEXT, ANSWER)

nlp(text)
# should output [{'generated_text': 'Who proved that air is necessary for combustion?'}]

高級用法

直接使用預訓練模型：

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained('Salesforce/mixqg-3b')
model = AutoModelForSeq2SeqLM.from_pretrained('Salesforce/mixqg-3b')

CONTEXT = "In the late 17th century, Robert Boyle proved that air is necessary for combustion."
ANSWER = "Robert Boyle"

def format_inputs(context: str, answer: str):
    return f"{answer} \\n {context}"
    
text = format_inputs(CONTEXT, ANSWER)

input_ids = tokenizer(text, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=32, num_beams=4)
output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(output)
# should output "Who proved that air is necessary for combustion?"

📚 詳細文檔

引用

@misc{murakhovska2021mixqg,
      title={MixQG: Neural Question Generation with Mixed Answer Types}, 
      author={Lidiya Murakhovs'ka and Chien-Sheng Wu and Tong Niu and Wenhao Liu and Caiming Xiong},
      year={2021},
      eprint={2110.08175},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

🔧 技術細節

倫理考量

本次發佈僅用於支持學術論文的研究目的。我們的模型、數據集和代碼並非專門為所有下游用途而設計或評估。我們強烈建議用戶在部署此模型之前，評估並解決與準確性、安全性和公平性相關的潛在問題。我們鼓勵用戶考慮人工智能的常見侷限性，遵守適用法律，並在選擇用例時採用最佳實踐，特別是在錯誤或濫用可能對人們的生活、權利或安全產生重大影響的高風險場景中。有關用例的進一步指導，請參考我們的可接受使用政策（AUP）和人工智能可接受使用政策（AI AUP）。