ModernBERT-QnA-base-squad開源問答模型 - 高效應對抽取式問答任務

首頁

Modernbert QnA Base Squad

由rankyx開發

基於ModernBERT微調的問答模型，在SQuAD數據集上表現優異，適用於抽取式問答任務。

問答系統

Transformers

英語開源協議:Apache-2.0 #抽取式問答 #高精度F1 #SQuAD優化

下載量 1,106

發布時間 : 1/8/2025

模型概述

該模型是基於answerdotai/ModernBERT-base微調的問答系統，專門針對SQuAD數據集優化，能夠從給定上下文中準確抽取問題答案。

模型特點

高精度問答

在SQuAD數據集上達到92.59的F1分數和86.45的精確匹配率

基於ModernBERT

採用ModernBERT架構，具有更強的語言理解能力

易於集成

支持Hugging Face Transformers流水線，方便快速部署

模型能力

抽取式問答

文本理解

上下文分析

使用案例

教育

閱讀理解輔助

幫助學生快速從文本中找到問題答案

提高學習效率和理解能力

客戶服務

FAQ自動回答

從知識庫文檔中自動提取問題答案

減少人工客服工作量

🚀 ModernBERT-QnA-base-squad

ModernBERT-QnA-base-squad 是一個針對問答任務微調的 ModernBERT 模型。該模型在 SQuAD 數據集上表現出色，非常適合抽取式問答應用。

🚀 快速開始

在使用微調後的模型進行問答之前，你需要安裝特定的 transformers 分支，直到官方 PR 合併為止。

> pip uninstall transformers -y
> git clone https://github.com/bakrianoo/transformers.git
> cd transformers && git checkout feat-ModernBert-QnA-Support && pip install -e .

基礎用法

from transformers.models.modernbert.modular_modernbert import ModernBertForQuestionAnswering
from transformers import AutoTokenizer, pipeline

# 加載模型和分詞器
model_id = "rankyx/ModernBERT-QnA-base-squad"
model = ModernBertForQuestionAnswering.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 初始化問答管道
question_answerer = pipeline("question-answering", model=model, tokenizer=tokenizer)

# 示例輸入
question = "How many parameters does BLOOM contain?"
context = "BLOOM has 176 billion parameters and can generate text in 46 natural languages and 13 programming languages."

# 獲取答案
result = question_answerer(question=question, context=context)
print(result)

示例輸出

{'score': 0.7719728946685791, 'start': 9, 'end': 21, 'answer': '176 billion'}

✨ 主要特性

高性能：在 SQuAD 數據集上，F1 分數達到 92.59，精確匹配率達到 86.45。
易於使用：可以通過 Hugging Face 的 pipeline 輕鬆調用。

📦 安裝指南

> pip uninstall transformers -y
> git clone https://github.com/bakrianoo/transformers.git
> cd transformers && git checkout feat-ModernBert-QnA-Support && pip install -e .

📚 詳細文檔

模型概述

模型 ID：rankyx/ModernBERT-QnA-base-squad
基礎模型：answerdotai/ModernBERT-base
數據集：SQuAD
評估指標：
- F1 分數：92.59
- 精確匹配率：86.45
訓練框架：Hugging Face Transformers

更多關於 ModernBERT 能力的信息，請參考 Hugging Face 博客文章。

性能演示

示例 1：短文本上下文

from transformers import pipeline

model_id = "rankyx/ModernBERT-QnA-base-squad"
question_answerer = pipeline("question-answering", model=model_id)

# 輸入
question = "What is the capital of France?"
context = "France's capital is Paris, known for its art, gastronomy, and culture."

# 獲取答案
result = question_answerer(question=question, context=context)
print(result)

預測答案：

{'score': 0.9913662075996399, 'start': 19, 'end': 25, 'answer': ' Paris'}

示例 2：長文本上下文

from transformers import pipeline

model_id = "rankyx/ModernBERT-QnA-base-squad"
question_answerer = pipeline("question-answering", model=model_id)

# 輸入
question = "What are the major achievements of Isaac Newton?"
context = """
Isaac Newton, born on January 4, 1643, was an English mathematician, physicist, astronomer, and author. He is widely recognized as one of the greatest mathematicians and most influential scientists of all time. Newton made groundbreaking contributions to many fields, including the laws of motion and universal gravitation. He also developed calculus independently, providing the mathematical foundation for classical mechanics. Additionally, Newton's work in optics led to the invention of the reflecting telescope.
"""

# 獲取答案
result = question_answerer(question=question, context=context)
print(result)

預測答案：

{'score': 0.5126065015792847, 'start': 278, 'end': 323, 'answer': ' the laws of motion and universal gravitation'}

示例 3：極長文本上下文

from transformers import pipeline

model_id = "rankyx/ModernBERT-QnA-base-squad"
question_answerer = pipeline("question-answering", model=model_id)

# 輸入
question = "Describe the primary focus of the United Nations."
context = """
The United Nations (UN) is an international organization founded in 1945. It is currently made up of 193 Member States. The mission and work of the United Nations are guided by the purposes and principles contained in its founding Charter. The UN is best known for its peacekeeping, peacebuilding, conflict prevention, and humanitarian assistance. It also works on promoting sustainable development, protecting human rights, upholding international law, and delivering humanitarian aid. Through its various specialized agencies, funds, and programs, the UN addresses issues ranging from health to education to climate change.
"""

# 獲取答案
result = question_answerer(question=question, context=context)
print(result)

預測答案：

{'score': 0.08445773273706436, 'start': 269, 'end': 347, 'answer': ' peacekeeping, peacebuilding, conflict prevention, and humanitarian assistance'}

微調過程

該模型使用 Hugging Face Transformers 庫和官方的問答腳本進行微調。

微調命令

python run_qa.py \
  --model_name_or_path "answerdotai/ModernBERT-base" \
  --dataset_name squad \
  --do_train \
  --do_eval \
  --overwrite_output_dir \
  --per_device_train_batch_size 25 \
  --per_device_eval_batch_size 20 \
  --eval_strategy="steps" \
  --save_strategy="epoch" \
  --logging_steps 50 \
  --eval_steps 500 \
  --learning_rate 3e-5 \
  --warmup_ratio 0.1 \
  --weight_decay 0.01 \
  --doc_stride 128 \
  --max_seq_length 384 \
  --max_answer_length 128 \
  --num_train_epochs 2 \
  --run_name="ModernBERT-QnA-base-squad" \
  --output_dir="/path/to/output/directory"

如果你有多個 GPU 並遇到錯誤 RuntimeError: Detected that you are using FX to symbolically trace a dynamo-optimized function，可以嘗試以下命令：

accelerate launch run_qa.py \
...其他參數

結果

評估指標

F1 分數：92.59
精確匹配率：86.45
訓練損失：0.860

信息表格

屬性	詳情
模型類型	ModernBERT
訓練數據	SQuAD
評估指標	F1 分數：92.59；精確匹配率：86.45；訓練損失：0.860
訓練框架	Hugging Face Transformers
基礎模型	answerdotai/ModernBERT-base

🔧 技術細節

模型使用 Hugging Face Transformers 庫進行微調，通過特定的訓練參數和策略在 SQuAD 數據集上進行訓練和評估。在訓練過程中，使用了特定的批量大小、學習率、熱身比率等參數，以達到較好的性能。

📄 許可證

該模型採用 Apache 2.0 許可證。詳情請參閱 LICENSE。

✍️ 引用

如果你在研究中使用了該模型，請按以下方式引用：

@misc{rankyx2024modernbertqna,
  title={ModernBERT-QnA-base-squad},
  author={Abu Bakr},
  year={2024},
  howpublished={\url{https://huggingface.co/rankyx/ModernBERT-QnA-base-squad}}
}