BERT-large-uncased-wwm-squadv2-x2.15-f83.2-d25-hybrid-v1 Open-source Model - 2.15 times faster after pruning, efficient and practical

Bert Large Uncased Wwm Squadv2 X2.15 F83.2 D25 Hybrid V1

Developed by madlag

This model is pruned using the nn_pruning library, retaining 32% of the original weights, with a processing speed 2.15 times faster than the original version and an F1 score of 83.22

Question Answering System

Transformers

EnglishOpen Source License:MIT #Question Answering System #Weight Pruning #Efficient Inference

Downloads 21

Release Time : 3/2/2022

Model Overview

A question answering system model based on the BERT-large architecture, fine-tuned for the SQuAD 2.0 dataset, utilizing whole-word masking technology, suitable for English question answering tasks

Model Features

Efficient Pruning Technique

Structured pruning achieved via the nn_pruning library, retaining 25% of weights in linear layers and 32% overall

Accelerated Inference

Processing speed reaches 2.15 times that of the original BERT-large

Attention Head Optimization

155 out of 384 attention heads (40.4%) were pruned to improve computational efficiency

Model Capabilities

English Question Answering

Reading Comprehension

Text Understanding

Use Cases

Education

Learning Assistance System

Helps students quickly obtain answers from textbooks

Accuracy with F1 score of 83.22

Customer Support

FAQ Auto-Response

Automatically retrieves answers from a knowledge base

🚀 bert-large-uncased-whole-word-masking model fine-tuned on SQuAD v2

This model addresses the need for efficient question - answering by leveraging pruning techniques. It offers a balance between computational speed and accuracy, making it suitable for resource - constrained environments.

✨ Features

The linear layers of this model contain 25.0% of the original weights, and the model contains 32.0% of the original weights overall.
It runs 2.15x as fast as bert - large - uncased - whole - word - masking on evaluation due to structured matrices created by the pruning method.
The model has an F1 score of 83.22, with a 2.63 drop compared to bert - large - uncased - whole - word - masking.

📦 Installation

Install nn_pruning: it contains the optimization script, which just packs the linear layers into smaller ones by removing empty rows/columns.

pip install nn_pruning

💻 Usage Examples

Basic Usage

from transformers import pipeline
from nn_pruning.inference_model_patcher import optimize_model

qa_pipeline = pipeline(
    "question-answering",
    model="madlag/bert-large-uncased-wwm-squadv2-x2.15-f83.2-d25-hybrid-v1",
    tokenizer="madlag/bert-large-uncased-wwm-squadv2-x2.15-f83.2-d25-hybrid-v1"
)

print("bert-large-uncased-whole-word-masking parameters: 497.0M")
print(f"Parameters count (includes only head pruning, not feed forward pruning)={int(qa_pipeline.model.num_parameters() / 1E6)}M")
qa_pipeline.model = optimize_model(qa_pipeline.model, "dense")

print(f"Parameters count after complete optimization={int(qa_pipeline.model.num_parameters() / 1E6)}M")
predictions = qa_pipeline({
    'context': "Frédéric François Chopin, born Fryderyk Franciszek Chopin (1 March 1810 – 17 October 1849), was a Polish composer and virtuoso pianist of the Romantic era who wrote primarily for solo piano.",
    'question': "Who is Frederic Chopin?",
})
print("Predictions", predictions)

📚 Documentation

Fine - Pruning details

This model was fine - tuned from the HuggingFace [model](https://huggingface.co/bert - large - uncased - whole - word - masking) checkpoint on [SQuAD2.0](https://rajpurkar.github.io/SQuAD - explorer), and distilled from the model [madlag/bert - large - uncased - whole - word - masking - finetuned - squadv2](https://huggingface.co/madlag/bert - large - uncased - whole - word - masking - finetuned - squadv2). This model is case - insensitive.

A side - effect of the block pruning is that some of the attention heads are completely removed: 155 heads were removed on a total of 384 (40.4%).

Details of the SQuAD1.1 dataset

Dataset	Split	# samples
SQuAD 2.0	train	130.0K
SQuAD 2.0	eval	11.9k

Fine - tuning

Python: 3.8.5
Machine specs:

CPU: Intel(R) Core(TM) i7 - 6700K CPU
Memory: 64 GiB
GPUs: 1 GeForce GTX 3090, with 24GiB memory
GPU driver: 455.23.05, CUDA: 11.1

Results

Pytorch model file size: 1119MB (original BERT: 1228.0MB)

Metric	# Value	# Original ([Table 2](https://www.aclweb.org/anthology/N19 - 1423.pdf))	Variation
EM	80.19	82.83	-3.64
F1	83.22	85.85	-2.63

{
    "HasAns_exact": 76.48448043184885,
    "HasAns_f1": 82.55514100819374,
    "HasAns_total": 5928,
    "NoAns_exact": 83.8856181665265,
    "NoAns_f1": 83.8856181665265,
    "NoAns_total": 5945,
    "best_exact": 80.19034784805862,
    "best_exact_thresh": 0.0,
    "best_f1": 83.22133208932635,
    "best_f1_thresh": 0.0,
    "exact": 80.19034784805862,
    "f1": 83.22133208932645,
    "total": 11873
}

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご