ModernBERT-large-squad2-v0.1開源問答模型 - 免費部署支持長上下文問答

首頁

Modernbert Large Squad2 V0.1

由Praise2112開發

基於ModernBERT-large在SQuAD 2.0數據集上微調的問答模型，支持長上下文處理

問答系統

Transformers

開源協議:Apache-2.0 #長文本問答 #RoPE位置編碼 #8192長上下文

下載量 19

發布時間 : 1/11/2025

模型概述

該模型是基於ModernBERT-large架構在SQuAD 2.0數據集上微調的問答模型，特別擅長處理長文檔的問答任務，原生支持8192個標記的上下文長度。

模型特點

長上下文支持

原生支持8192個標記的上下文長度，適合處理長文檔問答

高效架構

採用旋轉位置嵌入(RoPE)和局部-全局交替注意力機制，提高長輸入處理效率

高性能問答

在SQuAD 2.0數據集上達到86.27的exact match分數和89.30的F1分數

模型能力

長文檔問答

文本理解

信息提取

使用案例

文檔處理

技術文檔問答

從長技術文檔中提取準確答案

可準確回答文檔中的技術問題

法律文檔分析

分析法律合同和條款

能從複雜法律文本中提取關鍵信息

知識檢索

企業知識庫問答

構建企業知識問答系統

可處理大量企業文檔並提供準確回答

🚀 ModernBERT-large-squad2-v0.1

本模型是 answerdotai/ModernBERT-large 在 rajpurkar/squad_v2 數據集上的微調版本。它在訓練時使用的最大序列長度為 8192，並且需要將 trust_remote_code 設置為 True 才能加載該模型。

🚀 快速開始

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) 獲取預測結果
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

💻 使用示例

基礎用法

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) 獲取預測結果
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

🔧 技術細節

訓練超參數

訓練過程中使用了以下超參數：

學習率：1e-05
訓練批次大小：8
評估批次大小：8
隨機種子：42
梯度累積步數：8
總訓練批次大小：64
優化器：使用 ExtendedOptimizerNames.ADAMW_TORCH，其中 betas=(0.9,0.999)，epsilon=1e-08，且無額外優化器參數
學習率調度器類型：線性
學習率調度器熱身比例：0.1
訓練輪數：4