bert-base-uncased-squad1.1-block-sparse-0.07-v1 Open-source Q&A Model - Lightweight Structure, Significantly Improved Evaluation Speed

Bert Base Uncased Squad1.1 Block Sparse 0.07 V1

Developed by madlag

This is a BERT-base uncased model fine-tuned on the SQuAD1.1 dataset, featuring a block sparse structure that retains only 28.2% of the original weights, with an evaluation speed 1.92 times faster than the dense network.

Question Answering System

Transformers

EnglishOpen Source License:MIT #Question Answering System #Block Sparse Structure #Efficient Inference

Downloads 209

Release Time : 3/2/2022

Model Overview

This model is primarily designed for question-answering tasks, capable of answering relevant questions based on given contexts. It employs dynamic pruning techniques for optimization, improving inference speed while maintaining high accuracy.

Model Features

Block Sparse Structure

Linear layers retain only 7.5% of the original weights, with an overall retention of 28.2%, significantly reducing model size and improving inference speed.

Efficient Inference

Evaluation speed is 1.92 times faster than dense networks while maintaining relatively high accuracy.

Attention Head Optimization

106 out of 144 attention heads (73.6%) were removed, optimizing the model structure.

Model Capabilities

Text-based Question Answering

Context Understanding

Information Extraction

Use Cases

Intelligent Question Answering System

Fact-based Question Answering

Answer specific factual questions based on provided context

EM:71.88, F1:81.36

Educational Applications

Learning Assistance Q&A

Help students quickly find answers to questions from textbook content

🚀 BERT-base uncased model fine-tuned on SQuAD v1

This is a block sparse BERT-base uncased model fine-tuned on SQuAD v1, which can run faster with some impact on accuracy.

🚀 Quick Start

This model is block sparse: the linear layers contains 7.5% of the original weights. The model contains 28.2% of the original weights overall. The training use a modified version of Victor Sanh Movement Pruning method. That means that with the block-sparse runtime it ran 1.92x faster than an dense networks on the evaluation, at the price of some impact on the accuracy (see below).

This model was fine-tuned from the HuggingFace BERT base uncased checkpoint on SQuAD1.1, and distilled from the equivalent model csarron/bert-base-uncased-squad-v1. This model is case-insensitive: it does not make a difference between english and English.

✨ Features

Block sparse structure, running 1.92x faster than dense networks on evaluation with some accuracy impact.
Fine-tuned on SQuAD v1 and distilled from an equivalent model.
Case-insensitive.

🔧 Technical Details

Pruning details

A side-effect of the block pruning is that some of the attention heads are completely removed: 106 heads were removed on a total of 144 (73.6%). Here is a detailed view on how the remaining heads are distributed in the network after pruning.

Pruning details

Density plot

Details

Dataset	Split	# samples
SQuAD1.1	train	90.6K
SQuAD1.1	eval	11.1k

Fine-tuning

Python: 3.8.5
Machine specs:

Memory: 64 GiB
GPUs: 1 GeForce GTX 3090, with 24GiB memory
GPU driver: 455.23.05, CUDA: 11.1

Results

Pytorch model file size: 335M (original BERT: 438M)

Metric	# Value	# Original (Table 2)
EM	71.88	80.8
F1	81.36	88.5

💻 Usage Examples

Basic Usage

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="madlag/bert-base-uncased-squad1.1-block-sparse-0.07-v1",
    tokenizer="madlag/bert-base-uncased-squad1.1-block-sparse-0.07-v1"
)

predictions = qa_pipeline({
    'context': "Frédéric François Chopin, born Fryderyk Franciszek Chopin (1 March 1810 – 17 October 1849), was a Polish composer and virtuoso pianist of the Romantic era who wrote primarily for solo piano.",
    'question': "Who is Frederic Chopin?",
})

print(predictions)

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご