ModernBERT-large-squad2-v0.1开源问答模型 - 免费部署支持长上下文问答

首页

Modernbert Large Squad2 V0.1

由 Praise2112 开发

基于ModernBERT-large在SQuAD 2.0数据集上微调的问答模型，支持长上下文处理

问答系统

Transformers

开源协议:Apache-2.0 #长文本问答 #RoPE位置编码 #8192长上下文

下载量 19

发布时间 : 1/11/2025

模型简介

该模型是基于ModernBERT-large架构在SQuAD 2.0数据集上微调的问答模型，特别擅长处理长文档的问答任务，原生支持8192个标记的上下文长度。

模型特点

长上下文支持

原生支持8192个标记的上下文长度，适合处理长文档问答

高效架构

采用旋转位置嵌入(RoPE)和局部-全局交替注意力机制，提高长输入处理效率

高性能问答

在SQuAD 2.0数据集上达到86.27的exact match分数和89.30的F1分数

模型能力

长文档问答

文本理解

信息提取

使用案例

文档处理

技术文档问答

从长技术文档中提取准确答案

可准确回答文档中的技术问题

法律文档分析

分析法律合同和条款

能从复杂法律文本中提取关键信息

知识检索

企业知识库问答

构建企业知识问答系统

可处理大量企业文档并提供准确回答

🚀 ModernBERT-large-squad2-v0.1

本模型是 answerdotai/ModernBERT-large 在 rajpurkar/squad_v2 数据集上的微调版本。它在训练时使用的最大序列长度为 8192，并且需要将 trust_remote_code 设置为 True 才能加载该模型。

🚀 快速开始

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) 获取预测结果
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

💻 使用示例

基础用法

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) 获取预测结果
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

🔧 技术细节

训练超参数

训练过程中使用了以下超参数：

学习率：1e-05
训练批次大小：8
评估批次大小：8
随机种子：42
梯度累积步数：8
总训练批次大小：64
优化器：使用 ExtendedOptimizerNames.ADAMW_TORCH，其中 betas=(0.9,0.999)，epsilon=1e-08，且无额外优化器参数
学习率调度器类型：线性
学习率调度器热身比例：0.1
训练轮数：4