🚀 ModernBERT-large-squad2-v0.1
本模型是 answerdotai/ModernBERT-large 在 rajpurkar/squad_v2 数据集上的微调版本。它在训练时使用的最大序列长度为 8192,并且需要将 trust_remote_code
设置为 True
才能加载该模型。
🚀 快速开始
from transformers import pipeline
model_name = "praise2112/ModernBERT-large-squad2-v0.1"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:
Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.
It is available in the following sizes:
ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.
ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""
question = "Why was RoPE used in ModernBERT?"
res = nlp(question=question, context=context, max_seq_len=8192)
💻 使用示例
基础用法
from transformers import pipeline
model_name = "praise2112/ModernBERT-large-squad2-v0.1"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:
Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.
It is available in the following sizes:
ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.
ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""
question = "Why was RoPE used in ModernBERT?"
res = nlp(question=question, context=context, max_seq_len=8192)
🔧 技术细节
训练超参数
训练过程中使用了以下超参数:
- 学习率:1e-05
- 训练批次大小:8
- 评估批次大小:8
- 随机种子:42
- 梯度累积步数:8
- 总训练批次大小:64
- 优化器:使用 ExtendedOptimizerNames.ADAMW_TORCH,其中 betas=(0.9,0.999),epsilon=1e-08,且无额外优化器参数
- 学习率调度器类型:线性
- 学习率调度器热身比例:0.1
- 训练轮数:4
训练结果
指标 |
值 |
评估精确率 |
86.27 |
评估 F1 值 |
89.30 |
框架版本
- Transformers 4.48.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 2.20.0
- Tokenizers 0.21.0
📄 许可证
本项目采用 Apache-2.0 许可证。