roberta - base - best - squad2オープンソース英語質問応答モデル - 回答あり/なしの質問応答シナリオを無料で処理

ホーム

Roberta Base Best Squad2

PremalMataliaによって開発

RoBERTaに基づく英語の抽出型質問応答モデルで、SQuAD 2.0データセットで訓練され、回答ありと回答なしの質問応答シナリオを処理できます。

質問応答システム

Transformers

#英語質問応答システム #高精度読解 #SQuAD2.0最適化

ダウンロード数 30

リリース時間 : 3/2/2022

モデル概要

このモデルはRoBERTa-baseアーキテクチャを最適化した質問応答システムで、SQuAD 2.0データセットに特化して微調整され、与えられたテキストに基づく質問に正確に回答するか、質問が解けないかどうかを判断できます。

モデル特徴

無回答検出能力

特殊な閾値CLS_threshold=-3を使用して、無回答の状況をより正確に識別します。

高性能

SQuAD 2.0テストセットで81.19%の完全一致率と83.95%のF1スコアを達成しました。

最適化されたパラメータ設定

多項式学習率スケジューラとAdamWオプティマイザを採用し、6エポックの訓練を行いました。

モデル能力

テキスト理解

質問回答

無回答検出

コンテキスト分析

使用事例

教育

読解支援

学生がテキストから質問の答えを迅速に見つけるのを支援します。

学習効率と理解能力を向上させます。

カスタマーサービス

FAQ自動回答

知識ベースのドキュメントから質問の答えを抽出します。

人工客服の作業量を削減します。

🚀 RoBERTa-base for QA

このモデルは、英語の抽出型質問応答タスクに特化したRoBERTa-base言語モデルです。SQuAD 2.0データセットで訓練され、高精度な応答を提供します。

🚀 クイックスタート

このモデルは、英語の抽出型質問応答タスクに使用されます。以下のセクションでは、モデルの概要、環境情報、ハイパーパラメータ、性能、使用方法などについて説明します。

✨ 主な機能

言語モデル：'roberta-base'
言語：英語
下流タスク：抽出型質問応答
訓練データ：SQuAD 2.0
評価データ：SQuAD 2.0

📦 インストール

ドキュメントに具体的なインストール手順が記載されていないため、このセクションをスキップします。

💻 使用例

基本的な使用法

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "PremalMatalia/roberta-base-best-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Which name is also used to describe the Amazon rainforest in English?',
    'context': 'The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet\'s remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.'
}
res = nlp(QA_input)
print(res)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

📚 ドキュメント

環境情報

プロパティ	詳細
`transformers` バージョン	4.9.1
プラットフォーム	Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Pythonバージョン	3.7.11
PyTorchバージョン (GPU?)	1.9.0+cu102 (False)
Tensorflowバージョン (GPU?)	2.5.0 (False)

ハイパーパラメータ

max_seq_len=386
doc_stride=128
n_best_size=20
max_answer_length=30
min_null_score=7.0
batch_size=8

n_epochs=6
base_LM_model = "roberta-base"
learning_rate=1.5e-5
adam_epsilon=1e-5
adam_beta1=0.95
adam_beta2=0.999
warmup_steps=100
weight_decay=0.01
optimizer=AdamW
lr_scheduler="polynomial"

正解がないことをより正確に識別するために、特別な閾値CLS_threshold=-3が使用されています [ロジックはGitHubリポジトリで公開予定 [TBD]

性能

"exact": 81.192622
"f1":    83.95408
"total": 11873
"HasAns_exact": 74.190283
"HasAns_f1":    79.721119
"HasAns_total": 5928
"NoAns_exact":  88.174937
"NoAns_f1":     88.174937
"NoAns_total":  5945