🚀 ModernBERT-base-squad2-v0.2
本模型是 tasksource/ModernBERT-base-nli 在 rajpurkar/squad_v2 數據集上的微調版本。訓練時使用的最大序列長度為 8192。加載該模型時,需要將 trust_remote_code
設置為 True
。
🚀 快速開始
以下是使用該模型進行問答任務的示例代碼:
from transformers import pipeline
model_name = "praise2112/ModernBERT-base-squad2-v0.2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:
Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.
It is available in the following sizes:
ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.
ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""
question = "How many parameters does ModernBERT-base have?"
res = nlp(question=question, context=context, max_seq_len=8192)
💻 使用示例
基礎用法
from transformers import pipeline
model_name = "praise2112/ModernBERT-base-squad2-v0.2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:
Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.
It is available in the following sizes:
ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.
ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""
question = "How many parameters does ModernBERT-base have?"
res = nlp(question=question, context=context, max_seq_len=8192)
📚 詳細文檔
訓練過程
訓練超參數
訓練過程中使用了以下超參數:
- 學習率:3e-05
- 訓練批次大小:32
- 評估批次大小:32
- 隨機種子:42
- 優化器:使用 ExtendedOptimizerNames.ADAMW_TORCH,其中 betas=(0.9, 0.999),epsilon=1e-08,無額外優化器參數
- 學習率調度器類型:線性
- 學習率調度器熱身比例:0.1
- 訓練輪數:4
訓練結果
指標 |
值 |
評估精確率 |
83.9636 |
評估 F1 值 |
87.0387 |
框架版本
- Transformers 4.48.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 2.20.0
- Tokenizers 0.21.0
📄 許可證
本模型採用 Apache-2.0 許可證。
📦 模型信息
屬性 |
詳情 |
模型名稱 |
ModernBERT-base-squad2-v0.2 |
基礎模型 |
tasksource/ModernBERT-base-nli |
微調數據集 |
rajpurkar/squad_v2 |
許可證 |
Apache-2.0 |