ModernBERT-QnA-base-squad开源问答模型 - 高效应对抽取式问答任务

首页

Modernbert QnA Base Squad

由 rankyx 开发

基于ModernBERT微调的问答模型，在SQuAD数据集上表现优异，适用于抽取式问答任务。

问答系统

Transformers

英语开源协议:Apache-2.0 #抽取式问答 #高精度F1 #SQuAD优化

下载量 1,106

发布时间 : 1/8/2025

模型简介

该模型是基于answerdotai/ModernBERT-base微调的问答系统，专门针对SQuAD数据集优化，能够从给定上下文中准确抽取问题答案。

模型特点

高精度问答

在SQuAD数据集上达到92.59的F1分数和86.45的精确匹配率

基于ModernBERT

采用ModernBERT架构，具有更强的语言理解能力

易于集成

支持Hugging Face Transformers流水线，方便快速部署

模型能力

抽取式问答

文本理解

上下文分析

使用案例

教育

阅读理解辅助

帮助学生快速从文本中找到问题答案

提高学习效率和理解能力

客户服务

FAQ自动回答

从知识库文档中自动提取问题答案

减少人工客服工作量

🚀 ModernBERT-QnA-base-squad

ModernBERT-QnA-base-squad 是一个针对问答任务微调的 ModernBERT 模型。该模型在 SQuAD 数据集上表现出色，非常适合抽取式问答应用。

🚀 快速开始

在使用微调后的模型进行问答之前，你需要安装特定的 transformers 分支，直到官方 PR 合并为止。

> pip uninstall transformers -y
> git clone https://github.com/bakrianoo/transformers.git
> cd transformers && git checkout feat-ModernBert-QnA-Support && pip install -e .

基础用法

from transformers.models.modernbert.modular_modernbert import ModernBertForQuestionAnswering
from transformers import AutoTokenizer, pipeline

# 加载模型和分词器
model_id = "rankyx/ModernBERT-QnA-base-squad"
model = ModernBertForQuestionAnswering.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 初始化问答管道
question_answerer = pipeline("question-answering", model=model, tokenizer=tokenizer)

# 示例输入
question = "How many parameters does BLOOM contain?"
context = "BLOOM has 176 billion parameters and can generate text in 46 natural languages and 13 programming languages."

# 获取答案
result = question_answerer(question=question, context=context)
print(result)

示例输出

{'score': 0.7719728946685791, 'start': 9, 'end': 21, 'answer': '176 billion'}

✨ 主要特性

高性能：在 SQuAD 数据集上，F1 分数达到 92.59，精确匹配率达到 86.45。
易于使用：可以通过 Hugging Face 的 pipeline 轻松调用。

📦 安装指南

> pip uninstall transformers -y
> git clone https://github.com/bakrianoo/transformers.git
> cd transformers && git checkout feat-ModernBert-QnA-Support && pip install -e .

📚 详细文档

模型概述

模型 ID：rankyx/ModernBERT-QnA-base-squad
基础模型：answerdotai/ModernBERT-base
数据集：SQuAD
评估指标：
- F1 分数：92.59
- 精确匹配率：86.45
训练框架：Hugging Face Transformers

更多关于 ModernBERT 能力的信息，请参考 Hugging Face 博客文章。

性能演示

示例 1：短文本上下文

from transformers import pipeline

model_id = "rankyx/ModernBERT-QnA-base-squad"
question_answerer = pipeline("question-answering", model=model_id)

# 输入
question = "What is the capital of France?"
context = "France's capital is Paris, known for its art, gastronomy, and culture."

# 获取答案
result = question_answerer(question=question, context=context)
print(result)

预测答案：

{'score': 0.9913662075996399, 'start': 19, 'end': 25, 'answer': ' Paris'}

示例 2：长文本上下文

from transformers import pipeline

model_id = "rankyx/ModernBERT-QnA-base-squad"
question_answerer = pipeline("question-answering", model=model_id)

# 输入
question = "What are the major achievements of Isaac Newton?"
context = """
Isaac Newton, born on January 4, 1643, was an English mathematician, physicist, astronomer, and author. He is widely recognized as one of the greatest mathematicians and most influential scientists of all time. Newton made groundbreaking contributions to many fields, including the laws of motion and universal gravitation. He also developed calculus independently, providing the mathematical foundation for classical mechanics. Additionally, Newton's work in optics led to the invention of the reflecting telescope.
"""

# 获取答案
result = question_answerer(question=question, context=context)
print(result)

预测答案：

{'score': 0.5126065015792847, 'start': 278, 'end': 323, 'answer': ' the laws of motion and universal gravitation'}

示例 3：极长文本上下文

from transformers import pipeline

model_id = "rankyx/ModernBERT-QnA-base-squad"
question_answerer = pipeline("question-answering", model=model_id)

# 输入
question = "Describe the primary focus of the United Nations."
context = """
The United Nations (UN) is an international organization founded in 1945. It is currently made up of 193 Member States. The mission and work of the United Nations are guided by the purposes and principles contained in its founding Charter. The UN is best known for its peacekeeping, peacebuilding, conflict prevention, and humanitarian assistance. It also works on promoting sustainable development, protecting human rights, upholding international law, and delivering humanitarian aid. Through its various specialized agencies, funds, and programs, the UN addresses issues ranging from health to education to climate change.
"""

# 获取答案
result = question_answerer(question=question, context=context)
print(result)

预测答案：

{'score': 0.08445773273706436, 'start': 269, 'end': 347, 'answer': ' peacekeeping, peacebuilding, conflict prevention, and humanitarian assistance'}

微调过程

该模型使用 Hugging Face Transformers 库和官方的问答脚本进行微调。

微调命令

python run_qa.py \
  --model_name_or_path "answerdotai/ModernBERT-base" \
  --dataset_name squad \
  --do_train \
  --do_eval \
  --overwrite_output_dir \
  --per_device_train_batch_size 25 \
  --per_device_eval_batch_size 20 \
  --eval_strategy="steps" \
  --save_strategy="epoch" \
  --logging_steps 50 \
  --eval_steps 500 \
  --learning_rate 3e-5 \
  --warmup_ratio 0.1 \
  --weight_decay 0.01 \
  --doc_stride 128 \
  --max_seq_length 384 \
  --max_answer_length 128 \
  --num_train_epochs 2 \
  --run_name="ModernBERT-QnA-base-squad" \
  --output_dir="/path/to/output/directory"

如果你有多个 GPU 并遇到错误 RuntimeError: Detected that you are using FX to symbolically trace a dynamo-optimized function，可以尝试以下命令：

accelerate launch run_qa.py \
...其他参数

结果

评估指标

F1 分数：92.59
精确匹配率：86.45
训练损失：0.860

信息表格

属性	详情
模型类型	ModernBERT
训练数据	SQuAD
评估指标	F1 分数：92.59；精确匹配率：86.45；训练损失：0.860
训练框架	Hugging Face Transformers
基础模型	answerdotai/ModernBERT-base

🔧 技术细节

模型使用 Hugging Face Transformers 库进行微调，通过特定的训练参数和策略在 SQuAD 数据集上进行训练和评估。在训练过程中，使用了特定的批量大小、学习率、热身比率等参数，以达到较好的性能。

📄 许可证

该模型采用 Apache 2.0 许可证。详情请参阅 LICENSE。

✍️ 引用

如果你在研究中使用了该模型，请按以下方式引用：

@misc{rankyx2024modernbertqna,
  title={ModernBERT-QnA-base-squad},
  author={Abu Bakr},
  year={2024},
  howpublished={\url{https://huggingface.co/rankyx/ModernBERT-QnA-base-squad}}
}