bert-base-uncased-squad1.1-block-sparse-0.20-v1 Open-source English QA Model - Pruning Optimization for Question Answering Tasks

Bert Base Uncased Squad1.1 Block Sparse 0.20 V1

Developed by madlag

This is a pruned and optimized BERT Q&A model, retaining 38.1% of the original model's weights, fine-tuned on the SQuAD1.1 dataset, supporting English Q&A tasks.

Question Answering System

Transformers

EnglishOpen Source License:MIT #Q&A System #Block Sparse Model #BERT Distillation

Downloads 15

Release Time : 3/2/2022

Model Overview

A block-sparse Q&A model based on BERT-base uncased architecture, compressed using Movement Pruning, with inference speed 1.39x faster than the original model, suitable for English Q&A systems.

Model Features

Efficient Block Sparse Structure

Retains only 20.2% of linear layer weights, preserving 38.1% of the original model parameters, significantly reducing model size

Fast Inference

Using block-sparse runtime is 1.39x faster than dense networks

Attention Head Optimization

Removed 62.5% of attention heads (90 out of 144), optimizing computational efficiency

Knowledge Distillation

Distilled from csarron/bert-base-uncased-squad-v1 model, maintaining high accuracy

Model Capabilities

English Q&A

Text Understanding

Answer Extraction

Use Cases

Customer Support

Product Knowledge Q&A

Building automated Q&A systems based on product documentation

Can accurately answer user questions about product features

Educational Applications

Learning Assistant Q&A

Helping students quickly find answers in textbooks

Can accurately extract relevant information from textbook texts

🚀 BERT-base uncased model fine-tuned on SQuAD v1

This model is fine-tuned on SQuAD v1, leveraging block sparse technology to achieve faster runtime with some trade - offs in accuracy.

✨ Features

This model is block sparse. The linear layers contain 20.2% of the original weights, and the model contains 38.1% of the original weights overall.
It was fine - tuned from the HuggingFace BERT base uncased checkpoint on SQuAD1.1 and distilled from the equivalent model csarron/bert-base-uncased-squad-v1.
It is case - insensitive, making no difference between "english" and "English".
With the block - sparse runtime, it runs 1.39x faster than dense networks on the evaluation, though there is an impact on accuracy.

🔧 Technical Details

Pruning details

A side - effect of the block pruning is that some of the attention heads are completely removed: 90 heads were removed out of a total of 144 (62.5%). Here is a detailed view on how the remaining heads are distributed in the network after pruning.

Pruning details

Density plot

Training Details

Property	Details
Dataset	SQuAD1.1
Train Split Samples	90.6K
Eval Split Samples	11.1k
Python Version	3.8.5
Machine CPU	Intel(R) Core(TM) i7 - 6700K CPU
Machine Memory	64 GiB
Machine GPUs	1 GeForce GTX 3090, with 24GiB memory
GPU Driver	455.23.05, CUDA: 11.1

Results

Pytorch model file size: 347M (original BERT: 438M)

Metric	Value	Original (Table 2)
EM	76.98	80.8
F1	85.45	88.5

💻 Usage Examples

Basic Usage

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="madlag/bert-base-uncased-squad1.1-block-sparse-0.20-v1",
    tokenizer="madlag/bert-base-uncased-squad1.1-block-sparse-0.20-v1"
)

predictions = qa_pipeline({
    'context': "Frédéric François Chopin, born Fryderyk Franciszek Chopin (1 March 1810 – 17 October 1849), was a Polish composer and virtuoso pianist of the Romantic era who wrote primarily for solo piano.",
    'question': "Who is Frederic Chopin?",
})

print(predictions)

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご