ModernBERT-large-squad2-v0.1 Open-source Question Answering Model - Free Deployment and Support for Long Context Question Answering

Home

Modernbert Large Squad2 V0.1

Developed by Praise2112

A QA model fine-tuned on SQuAD 2.0 dataset based on ModernBERT-large, supporting long-context processing

Question Answering System

Transformers

Open Source License:Apache-2.0 #Long-form QA #RoPE Positional Encoding #8192-long Context

Downloads 19

Release Time : 1/11/2025

Model Overview

This model is a QA model fine-tuned on the SQuAD 2.0 dataset based on the ModernBERT-large architecture, particularly excelling in long-document QA tasks with native support for 8192-token context length.

Model Features

Long-context Support

Native support for 8192-token context length, suitable for long-document QA

Efficient Architecture

Utilizes Rotary Position Embedding (RoPE) and local-global alternating attention mechanism to improve long-input processing efficiency

High-performance QA

Achieves 86.27 exact match score and 89.30 F1 score on SQuAD 2.0 dataset

Model Capabilities

Long-document QA

Text Understanding

Information Extraction

Use Cases

Document Processing

Technical Document QA

Extract precise answers from long technical documents

Can accurately answer technical questions from documents

Legal Document Analysis

Analyze legal contracts and clauses

Can extract key information from complex legal texts

Knowledge Retrieval

Enterprise Knowledge Base QA

Build enterprise knowledge QA systems

Can process large volumes of enterprise documents and provide accurate answers

🚀 ModernBERT-large-squad2-v0.1

This model is a fine - tuned version of answerdotai/ModernBERT-large on the rajpurkar/squad_v2 dataset, with a maximum sequence length of 8192 used during training. It requires trust_remote_code to be set to True to load the model.

🚀 Quick Start

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

💻 Usage Examples

Basic Usage

from transformers import pipeline

model_name = "praise2112/ModernBERT-large-squad2-v0.1"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

context = """Model Summary
ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

Rotary Positional Embeddings (RoPE) for long-context support.
Local-Global Alternating Attention for efficiency on long inputs.
Unpadding and Flash Attention for efficient inference.
ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

It is available in the following sizes:

ModernBERT-base - 22 layers, 149 million parameters
ModernBERT-large - 28 layers, 395 million parameters
For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

question = "Why was RoPE used in ModernBERT?"

res = nlp(question=question, context=context, max_seq_len=8192)

# {'score': 0.5530015826225281, 'start': 309, 'end': 334, 'answer': ' for long-context support'}

📚 Documentation

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use ExtendedOptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4

Training results

Metric	Value
eval_exact	86.27
eval_f1	89.30

Framework versions

Transformers 4.48.0.dev0
Pytorch 2.5.1+cu124
Datasets 2.20.0
Tokenizers 0.21.0

📄 License

This model is licensed under the apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご