roberta - base - best - squad2開源英語問答模型 - 免費處理有/無答案問答場景

首頁

Roberta Base Best Squad2

由PremalMatalia開發

基於RoBERTa的英語抽取式問答模型，在SQuAD 2.0數據集上訓練，能處理有答案和無答案的問答場景

問答系統

Transformers

#英語問答系統 #高精度閱讀理解 #SQuAD2.0優化

下載量 30

發布時間 : 3/2/2022

模型概述

該模型是基於RoBERTa-base架構優化的問答系統，專門針對SQuAD 2.0數據集進行了微調，能夠準確回答基於給定文本的問題或判斷問題是否無解

模型特點

無答案檢測能力

使用特殊閾值CLS_threshold=-3來更準確識別無答案情況

高性能表現

在SQuAD 2.0測試集上達到81.19%的精確匹配率和83.95%的F1分數

優化參數設置

採用多項式學習率調度器和AdamW優化器，經過6輪訓練

模型能力

文本理解

問題回答

無答案檢測

上下文分析

使用案例

教育

閱讀理解輔助

幫助學生快速從文本中找到問題答案

提高學習效率和理解能力

客戶服務

FAQ自動回答

從知識庫文檔中提取問題答案

減少人工客服工作量

🚀 RoBERTa-base用於問答任務

本項目基於RoBERTa-base語言模型，專注於抽取式問答任務，使用SQuAD 2.0數據集進行訓練和評估。

🚀 快速開始

本項目使用roberta-base語言模型進行抽取式問答任務，訓練和評估數據均為SQuAD 2.0。

✨ 主要特性

語言模型：採用roberta-base。
下游任務：專注於抽取式問答。
訓練與評估數據：使用SQuAD 2.0數據集。

📦 安裝指南

文檔未提供具體安裝步驟，暫不展示。

💻 使用示例

基礎用法

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "PremalMatalia/roberta-base-best-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Which name is also used to describe the Amazon rainforest in English?',
    'context': 'The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet\'s remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.'
}
res = nlp(QA_input)
print(res)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

📚 詳細文檔

環境信息

屬性	詳情
`transformers`版本	4.9.1
平臺	Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Python版本	3.7.11
PyTorch版本（是否使用GPU）	1.9.0+cu102（否）
Tensorflow版本（是否使用GPU）	2.5.0（否）

超參數

max_seq_len=386
doc_stride=128
n_best_size=20
max_answer_length=30
min_null_score=7.0
batch_size=8

n_epochs=6
base_LM_model = "roberta-base"
learning_rate=1.5e-5
adam_epsilon=1e-5
adam_beta1=0.95
adam_beta2=0.999
warmup_steps=100
weight_decay=0.01
optimizer=AdamW
lr_scheduler="polynomial"

⚠️ 重要提示

有一個特殊的閾值CLS_threshold=-3，用於更準確地識別無答案情況，具體邏輯將在GitHub倉庫中提供（待更新）。

性能指標

"exact": 81.192622
"f1":    83.95408
"total": 11873
"HasAns_exact": 74.190283
"HasAns_f1":    79.721119
"HasAns_total": 5928
"NoAns_exact":  88.174937
"NoAns_f1":     88.174937
"NoAns_total":  5945