ModernBERT-base-squad2-v0.2開源問答模型 - 處理長上下文，精準解答問題

首頁

Modernbert Base Squad2 V0.2

由Praise2112開發

基於ModernBERT-base-nli微調的問答模型，支持長上下文處理

問答系統

Transformers

開源協議:Apache-2.0 #長文本問答 #8192上下文 #旋轉位置嵌入

下載量 42

發布時間 : 1/12/2025

模型概述

該模型是在ModernBERT-base-nli基礎上，針對rajpurkar/squad_v2數據集微調的問答模型，原生支持長達8,192標記的上下文長度，適用於問答任務。

模型特點

長上下文支持

原生支持長達8,192標記的上下文長度，適合處理長文檔問答

高效注意力機制

採用局部-全局交替注意力機制提升長輸入處理效率

優化推理效能

使用解填充和閃存注意力技術優化推理速度

模型能力

文本問答

長文檔理解

語義搜索

使用案例

問答系統

基於文檔的問答

從長文檔中提取答案回答用戶問題

精確匹配率83.96%，F1分數87.04%

信息檢索

文檔內容檢索

從大規模文檔中檢索相關信息

🚀 ModernBERT-base-squad2-v0.2

本模型是 tasksource/ModernBERT-base-nli 在 rajpurkar/squad_v2 數據集上的微調版本。訓練時使用的最大序列長度為 8192。加載該模型時，需要將 trust_remote_code 設置為 True。

🚀 快速開始

以下是使用該模型進行問答任務的示例代碼：

from transformers import pipeline

model_name = "praise2112/ModernBERT-base-squad2-v0.2"

# a) 獲取預測結果
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "How many parameters does ModernBERT-base have?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.698786735534668, 'start': 891, 'end': 903, 'answer': ' 149 million'}

💻 使用示例

基礎用法

from transformers import pipeline

model_name = "praise2112/ModernBERT-base-squad2-v0.2"

# a) 獲取預測結果
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "How many parameters does ModernBERT-base have?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.698786735534668, 'start': 891, 'end': 903, 'answer': ' 149 million'}

📚 詳細文檔

訓練過程

訓練超參數

訓練過程中使用了以下超參數：

學習率：3e-05
訓練批次大小：32
評估批次大小：32
隨機種子：42
優化器：使用 ExtendedOptimizerNames.ADAMW_TORCH，其中 betas=(0.9, 0.999)，epsilon=1e-08，無額外優化器參數
學習率調度器類型：線性
學習率調度器熱身比例：0.1
訓練輪數：4