🚀 RoBERTa-base用於問答任務
本項目基於RoBERTa-base語言模型,專注於抽取式問答任務,使用SQuAD 2.0數據集進行訓練和評估。
🚀 快速開始
本項目使用roberta-base
語言模型進行抽取式問答任務,訓練和評估數據均為SQuAD 2.0。
✨ 主要特性
- 語言模型:採用
roberta-base
。
- 下游任務:專注於抽取式問答。
- 訓練與評估數據:使用SQuAD 2.0數據集。
📦 安裝指南
文檔未提供具體安裝步驟,暫不展示。
💻 使用示例
基礎用法
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
model_name = "PremalMatalia/roberta-base-best-squad2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
'question': 'Which name is also used to describe the Amazon rainforest in English?',
'context': 'The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet\'s remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.'
}
res = nlp(QA_input)
print(res)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
📚 詳細文檔
環境信息
屬性 |
詳情 |
transformers 版本 |
4.9.1 |
平臺 |
Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic |
Python版本 |
3.7.11 |
PyTorch版本(是否使用GPU) |
1.9.0+cu102(否) |
Tensorflow版本(是否使用GPU) |
2.5.0(否) |
超參數
max_seq_len=386
doc_stride=128
n_best_size=20
max_answer_length=30
min_null_score=7.0
batch_size=8
n_epochs=6
base_LM_model = "roberta-base"
learning_rate=1.5e-5
adam_epsilon=1e-5
adam_beta1=0.95
adam_beta2=0.999
warmup_steps=100
weight_decay=0.01
optimizer=AdamW
lr_scheduler="polynomial"
⚠️ 重要提示
有一個特殊的閾值CLS_threshold=-3
,用於更準確地識別無答案情況,具體邏輯將在GitHub倉庫中提供(待更新)。
性能指標
"exact": 81.192622
"f1": 83.95408
"total": 11873
"HasAns_exact": 74.190283
"HasAns_f1": 79.721119
"HasAns_total": 5928
"NoAns_exact": 88.174937
"NoAns_f1": 88.174937
"NoAns_total": 5945
🔧 技術細節
文檔未提供具體技術細節,暫不展示。
📄 許可證
文檔未提供許可證信息,暫不展示。
👥 作者
Premal Matalia