🚀 ModernBERT-base-squad2-v0.2
このモデルは、rajpurkar/squad_v2データセットでtasksource/ModernBERT-base-nliをファインチューニングしたバージョンです。
トレーニング中に使用される最大シーケンス長は8192です。
モデルをロードするには、trust_remote_code
をTrue
に設定する必要があります。
🚀 クイックスタート
from transformers import pipeline
model_name = "praise2112/ModernBERT-base-squad2-v0.2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:
Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.
It is available in the following sizes:
ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.
ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""
question = "How many parameters does ModernBERT-base have?"
res = nlp(question=question, context=context, max_seq_len=8192)
💻 使用例
基本的な使用法
from transformers import pipeline
model_name = "praise2112/ModernBERT-base-squad2-v0.2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:
Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.
It is available in the following sizes:
ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.
ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""
question = "How many parameters does ModernBERT-base have?"
res = nlp(question=question, context=context, max_seq_len=8192)
📚 ドキュメント
トレーニング手順
トレーニングハイパーパラメータ
トレーニング中に使用されたハイパーパラメータは以下の通りです。
- learning_rate: 3e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: betas=(0.9,0.999)、epsilon=1e-08、optimizer_args=追加のオプティマイザ引数なしでExtendedOptimizerNames.ADAMW_TORCHを使用
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4
トレーニング結果
指標 |
値 |
eval_exact |
83.9636 |
eval_f1 |
87.0387 |
フレームワークバージョン
- Transformers 4.48.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 2.20.0
- Tokenizers 0.21.0
📄 ライセンス
このライブラリはApache-2.0ライセンスの下で提供されています。