ModernBERT-large-squad2-v0.1オープンソース質疑応答モデル - 長文脈質疑応答をサポートする無料でのデプロイ

ホーム

Modernbert Large Squad2 V0.1

Praise2112によって開発

ModernBERT-largeをSQuAD 2.0データセットでファインチューニングした質問応答モデルで、長文コンテキスト処理をサポート

質問応答システム

Transformers

オープンソースライセンス:Apache-2.0 #長文質問応答 #RoPE位置符号化 #8192長文コンテキスト

ダウンロード数 19

リリース時間 : 1/11/2025

モデル概要

このモデルはModernBERT-largeアーキテクチャを基にSQuAD 2.0データセットでファインチューニングされた質問応答モデルで、特に長文ドキュメントの質問応答タスクに優れており、8192トークンのコンテキスト長をネイティブサポートしています。

モデル特徴

長文コンテキストサポート

8192トークンのコンテキスト長をネイティブサポートし、長文ドキュメントの質問応答に適しています

効率的なアーキテクチャ

回転位置埋め込み(RoPE)と局所-グローバル交互注意メカニズムを採用し、長い入力処理の効率を向上

高性能質問応答

SQuAD 2.0データセットで86.27のexact matchスコアと89.30のF1スコアを達成

モデル能力

長文ドキュメント質問応答

テキスト理解

情報抽出

使用事例

ドキュメント処理

技術文書質問応答

長い技術文書から正確な回答を抽出

文書内の技術的な質問に正確に回答可能

法律文書分析

法律契約や条項を分析

複雑な法律テキストから重要な情報を抽出可能

知識検索

企業ナレッジベース質問応答

企業ナレッジ質問応答システムを構築

大量の企業文書を処理し正確な回答を提供可能

🚀 ModernBERT-large-squad2-v0.1

このモデルは、rajpurkar/squad_v2データセットでanswerdotai/ModernBERT-largeをファインチューニングしたバージョンです。トレーニング中に使用された最大シーケンス長は8192です。モデルをロードするには、trust_remote_codeをTrueに設定する必要があります。

🚀 クイックスタート

以下のコードを使用して、このモデルを使った質問応答タスクを実行できます。

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) 予測を取得
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

💻 使用例

基本的な使用法

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) 予測を取得
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

📚 ドキュメント

トレーニング手順

トレーニングハイパーパラメータ

トレーニング中に使用されたハイパーパラメータは以下の通りです。

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use ExtendedOptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4