ModernBERT-base-squad2-v0.2オープンソースQ&Aモデル - 長いコンテキストを処理し、正確に質問に回答する

ホーム

Modernbert Base Squad2 V0.2

Praise2112によって開発

ModernBERT-base-nliをファインチューニングした質問応答モデルで、長文コンテキスト処理をサポート

質問応答システム

Transformers

オープンソースライセンス:Apache-2.0 #長文質問応答 #8192コンテキスト #回転位置埋め込み

ダウンロード数 42

リリース時間 : 1/12/2025

モデル概要

このモデルはModernBERT-base-nliをベースに、rajpurkar/squad_v2データセットでファインチューニングされた質問応答モデルで、最大8,192トークンのコンテキスト長をネイティブサポートし、質問応答タスクに適しています。

モデル特徴

長文コンテキストサポート

最大8,192トークンのコンテキスト長をネイティブサポートし、長文ドキュメントの質問応答に適しています

効率的な注意メカニズム

局所-グローバル交互注意メカニズムを採用し、長い入力の処理効率を向上

推論効率の最適化

アンパディングとフラッシュアテンション技術を使用して推論速度を最適化

モデル能力

テキスト質問応答

長文ドキュメント理解

意味検索

使用事例

質問応答システム

ドキュメントベースの質問応答

長文ドキュメントから回答を抽出してユーザーの質問に答える

正確一致率83.96%、F1スコア87.04%

情報検索

ドキュメント内容検索

大規模ドキュメントから関連情報を検索

🚀 ModernBERT-base-squad2-v0.2

このモデルは、rajpurkar/squad_v2データセットでtasksource/ModernBERT-base-nliをファインチューニングしたバージョンです。トレーニング中に使用される最大シーケンス長は8192です。モデルをロードするには、trust_remote_codeをTrueに設定する必要があります。

🚀 クイックスタート

from transformers import pipeline

model_name = "praise2112/ModernBERT-base-squad2-v0.2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "How many parameters does ModernBERT-base have?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.698786735534668, 'start': 891, 'end': 903, 'answer': ' 149 million'}

💻 使用例

基本的な使用法

from transformers import pipeline

model_name = "praise2112/ModernBERT-base-squad2-v0.2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "How many parameters does ModernBERT-base have?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.698786735534668, 'start': 891, 'end': 903, 'answer': ' 149 million'}

📚 ドキュメント

トレーニング手順

トレーニングハイパーパラメータ

トレーニング中に使用されたハイパーパラメータは以下の通りです。

learning_rate: 3e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: betas=(0.9,0.999)、epsilon=1e-08、optimizer_args=追加のオプティマイザ引数なしでExtendedOptimizerNames.ADAMW_TORCHを使用
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4