bert-base-uncased-sparse-85-unstructured-pruneofa Open Source Model - Multilingual Task Fine-tuning, Pruning to Reduce Overhead

Bert Base Uncased Sparse 85 Unstructured Pruneofa

Developed by Intel

This is a sparse pre-trained model that can be fine-tuned for various language tasks, reducing computational overhead through weight pruning.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Weight Pruning #Efficient Transfer Learning #Pre-trained Models

Downloads 15

Release Time : 3/2/2022

Model Overview

This model employs a one-shot pruning general method, reducing computational costs by sparsifying the weight matrix while maintaining model performance, suitable for various downstream language tasks.

Model Features

One-Shot Pruning General Method

Adapts to multiple downstream tasks with just one pruning, eliminating the need for task-specific pruning.

85% Weight Sparsity

Achieves matrix sparsity by setting 85% of weights to zero, significantly reducing computational overhead.

Multi-Task Adaptability

Can be fine-tuned for various language tasks such as question answering, natural language inference, and sentiment classification.

Model Capabilities

Text Understanding

Language Model Fine-tuning

Question Answering System Support

Sentiment Analysis

Natural Language Inference

Use Cases

Question Answering Systems

SQuAD Question Answering

Fine-tuned for the Stanford Question Answering Dataset

EM/F1 scores 81.10/88.42

Text Classification

Sentiment Analysis

Used for SST-2 sentiment classification task

Accuracy 91.46%

Natural Language Inference

MNLI Task

Used for multi-genre natural language inference

MNLI-m accuracy 82.71%, MNLI-mm accuracy 83.67%

🚀 85% Sparse BERT-Base (uncased) Prune Once for All

This is a sparse pre-trained model that can be fine-tuned for various language tasks, reducing computational overhead while maintaining performance.

🚀 Quick Start

This model is a sparse pre - trained model that can be fine - tuned for a wide range of language tasks. The weight pruning process sets some neural network weights to zero, resulting in sparser matrices. Since updating neural network weights involves matrix multiplication, keeping matrices sparse while retaining important information can reduce overall computational overhead. The "sparse" in the model title indicates the sparsity ratio of the weights. For more details, refer to Zafrir et al. (2021).

Visualization of the Prune Once for All method from Zafrir et al. (2021):

✨ Features

Sparse Structure: Reduces computational overhead by setting some weights to zero.
Fine - Tunable: Can be fine - tuned for multiple language tasks.

📦 Installation

Not provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import transformers

model = transformers.AutoModelForQuestionAnswering.from_pretrained('Intel/bert-base-uncased-sparse-85-unstructured-pruneofa')

For more code examples, refer to the GitHub Repo.

📚 Documentation

Model Details

Property	Details
Model Authors - Company	Intel
Date	September 30, 2021
Version	1
Model Type	NLP - General sparse language model
Architecture	"The method consists of two steps, teacher preparation and student pruning. The sparse pre - trained model we trained is the model we use for transfer learning while maintaining its sparsity pattern. We call the method Prune Once for All since we show how to fine - tune the sparse pre - trained models for several language tasks while we prune the pre - trained model only once." (Zafrir et al., 2021)
Paper or Other Resources	Zafrir et al. (2021); GitHub Repo
License	Apache 2.0
Questions or Comments	Community Tab and Intel Developers Discord

Intended Use

Property	Details
Primary intended uses	This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine - tuned for several language tasks including (but not limited to) question - answering, genre natural language inference, and sentiment classification.
Primary intended users	Anyone who needs an efficient general language model for other downstream tasks.
Out - of - scope uses	The model should not be used to intentionally create hostile or alienating environments for people.

Metrics (Model Performance)

Model	Model Size	SQuADv1.1 (EM/F1)	MNLI - m (Acc)	MNLI - mm (Acc)	QQP (Acc/F1)	QNLI (Acc)	SST - 2 (Acc)
80% Sparse BERT - Base uncased fine - tuned on SQuAD1.1	-	81.29/88.47	-	-	-	-	-
85% Sparse BERT - Base uncased	Medium	81.10/88.42	82.71	83.67	91.15/88.00	90.34	91.46
90% Sparse BERT - Base uncased	Medium	79.83/87.25	81.45	82.43	90.93/87.72	89.07	90.88
90% Sparse BERT - Large uncased	Large	83.35/90.20	83.74	84.20	91.48/88.43	91.39	92.95
85% Sparse DistilBERT uncased	Small	78.10/85.82	81.35	82.03	90.29/86.97	88.31	90.60
90% Sparse DistilBERT uncased	Small	76.91/84.82	80.68	81.47	90.05/86.67	87.66	90.02

All the results are the mean of two separate experiments with the same hyper - parameters and different seeds.

Training and Evaluation Data

Property	Details
Datasets	English Wikipedia Dataset (2500M words).
Motivation	To build an efficient and accurate base model for several downstream language tasks.
Preprocessing	"We use the English Wikipedia dataset (2500M words) for training the models on the pre - training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers (Devlin et al., 2019, Sanh et al., 2019). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1."

Ethical Considerations

Property	Details
Data	The training data come from Wikipedia articles
Human life	The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles.
Mitigations	No additional risk mitigation strategies were considered during model development.
Risks and harms	Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al., 2021, and Bender et al., 2021). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown.
Use cases	-

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model.

🔧 Technical Details

The weight pruning process is crucial for this model. By setting some weights to zero, the model can achieve a sparser structure, which in turn reduces computational overhead during the weight - updating process. The "Prune Once for All" method allows for fine - tuning the sparse pre - trained model for multiple language tasks with only one pruning operation on the pre - trained model.

📄 License

This model is licensed under the Apache 2.0 license.

📚 BibTeX entry and citation info

@article{zafrir2021prune,
  title={Prune Once for All: Sparse Pre-Trained Language Models},
  author={Zafrir, Ofir and Larey, Ariel and Boudoukh, Guy and Shen, Haihao and Wasserblat, Moshe},
  journal={arXiv preprint arXiv:2111.05754},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご