bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa Open Source Model - Optimize the Experience of Question Answering Tasks

Bert Base Uncased Squadv1.1 Sparse 80 1x4 Block Pruneofa

Developed by Intel

This model is a BERT-Base fine-tuned for QA tasks, using 80% 1x4 block-sparse pre-training combined with knowledge distillation.

Question Answering System

Transformers

EnglishOpen Source License:Apache-2.0 #Block-sparse BERT #QA system optimization #Knowledge distillation

Downloads 27

Release Time : 3/2/2022

Model Overview

A sparse pre-trained Transformer language model trained with weight pruning and model distillation, suitable for QA system tasks.

Model Features

Sparse Pre-training

Uses 80% 1x4 block-sparse pre-training to reduce model parameters while maintaining performance.

Knowledge Distillation

Incorporates knowledge distillation to enhance model performance on downstream tasks.

Transfer Learning

Supports transferring knowledge from sparse pre-trained models to various downstream tasks.

Model Capabilities

Question Answering

Natural Language Understanding

Use Cases

Question Answering

Reading Comprehension

Answer questions based on given text.

Achieves 81.29% exact match and 88.47% F1 score on SQuAD dataset.

🚀 80% 1x4 Block Sparse BERT-Base (uncased) Fine Tuned on SQuADv1.1

This model is fine-tuned for question answering on the SQuAD 1.1 dataset, combining pruning and knowledge distillation.

🚀 Quick Start

This model has been fine-tuned for the NLP task of question answering, trained on the SQuAD 1.1 dataset. It is a result of fine-tuning a Prune Once For All 80% 1x4 block sparse pre-trained BERT-Base model, combined with knowledge distillation.

We present a new method for training sparse pre-trained Transformer language models by integrating weight pruning and model distillation. These sparse pre-trained models can be used to transfer learning for a wide range of tasks while maintaining their sparsity pattern. We show how the compressed sparse pre-trained models we trained transfer their knowledge to five different downstream natural language tasks with minimal accuracy loss.

✨ Features

Fine-tuned for QA: Specifically optimized for the question answering task on the SQuAD 1.1 dataset.
Sparse Model: Based on a Prune Once For All 80% 1x4 block sparse pre-trained BERT-Base model, combined with knowledge distillation.
Transfer Learning: Can be used for transfer learning on a variety of tasks while maintaining sparsity.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

Here is how to import this model in Python:

import transformers
import model_compression_research as model_comp

model = transformers.AutoModelForQuestionAnswering.from_pretrained('Intel/bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa')

scheduler = mcr.pruning_scheduler_factory(model, '../../examples/transformers/question-answering/config/lock_config.json')

# Train your model...

scheduler.remove_pruning()

For more code examples, refer to the GitHub Repo.

📚 Documentation

Model Details

Property	Details
Model Authors - Company	Intel
Model Card Authors	Intel
Date	February 27, 2022
Version	1
Model Type	NLP - Question Answering
Architecture	"The method consists of two steps, teacher preparation and student pruning. The sparse pre-trained model we trained is the model we use for transfer learning while maintaining its sparsity pattern. We call the method Prune Once for All since we show how to fine-tune the sparse pre-trained models for several language tasks while we prune the pre-trained model only once." (Zafrir et al., 2021)
Paper or Other Resources	Paper: Zafrir et al. (2021); GitHub Repo
License	Apache 2.0
Questions or Comments	Community Tab and Intel Developers Discord

Visualization of Prunce Once for All method from Zafrir et al. (2021). More details can be found in their paper.

Intended Use

Property	Details
Primary intended uses	You can use the model for the NLP task of question answering: given a corpus of text, you can ask it a question about that text, and it will find the answer in the text.
Primary intended users	Anyone doing question answering
Out-of-scope uses	The model should not be used to intentionally create hostile or alienating environments for people.

Metrics (Model Performance)

Model	Model Size	SQuADv1.1 (EM/F1)	MNLI-m (Acc)	MNLI-mm (Acc)	QQP (Acc/F1)	QNLI (Acc)	SST-2 (Acc)
80% 1x4 Block Sparse BERT-Base uncased	-	81.29/88.47	-	-	-	-	-
85% Sparse BERT-Base uncased	Medium	81.10/88.42	82.71	83.67	91.15/88.00	90.34	91.46
90% Sparse BERT-Base uncased	Medium	79.83/87.25	81.45	82.43	90.93/87.72	89.07	90.88
90% Sparse BERT-Large uncased	Large	83.35/90.20	83.74	84.20	91.48/88.43	91.39	92.95
85% Sparse DistilBERT uncased	Small	78.10/85.82	81.35	82.03	90.29/86.97	88.31	90.60
90% Sparse DistilBERT uncased	Small	76.91/84.82	80.68	81.47	90.05/86.67	87.66	90.02

All the results are the mean of two seperate experiments with the same hyper-parameters and different seeds.

Training and Evaluation Data

Property	Details
Datasets	SQuAD1.1: "Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable." (https://huggingface.co/datasets/squad)
Motivation	To build an efficient and accurate model for the question answering task.
Preprocessing	"We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers (Devlin et al., 2019, Sanh et al., 2019). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." Following the pre-training on Wikipedia, fine-tuning is completed on the SQuAD1.1 dataset.

Ethical Considerations

Property	Details
Data	The training data come from Wikipedia articles
Human life	The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles.
Mitigations	No additional risk mitigation strategies were considered during model development.
Risks and harms	Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al., 2021, and Bender et al., 2021). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown.
Use cases	-

Caveats and Recommendations

⚠️ Important Note

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model.

🔧 Technical Details

The method consists of two steps, teacher preparation and student pruning. The sparse pre-trained model is used for transfer learning while maintaining its sparsity pattern. The method is called Prune Once for All since the pre-trained model is pruned only once while fine-tuning for several language tasks. (Zafrir et al., 2021)

📄 License

This model is licensed under the Apache 2.0 license.

BibTeX entry and citation info

@article{zafrir2021prune,
  title={Prune Once for All: Sparse Pre-Trained Language Models},
  author={Zafrir, Ofir and Larey, Ariel and Boudoukh, Guy and Shen, Haihao and Wasserblat, Moshe},
  journal={arXiv preprint arXiv:2111.05754},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご