đ bert-base-uncased model fine-tuned on SST-2
This project presents a fine - tuned bert-base-uncased
model on the SST - 2 dataset. It leverages the nn_pruning
library to optimize the model's weights, achieving a balance between performance and model size.
đ Quick Start
To start using this model, you first need to install the nn_pruning
library. It includes an optimization script that packs linear layers into smaller ones by removing empty rows/columns.
pip install nn_pruning
Then, you can use the transformers
library as usual, but remember to call optimize_model
after the pipeline has loaded.
from transformers import pipeline
from nn_pruning.inference_model_patcher import optimize_model
cls_pipeline = pipeline(
"text-classification",
model="echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid",
tokenizer="echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid",
)
print(f"Parameters count (includes only head pruning, no feed forward pruning)={int(cls_pipeline.model.num_parameters() / 1E6)}M")
cls_pipeline.model = optimize_model(cls_pipeline.model, "dense")
print(f"Parameters count after optimization={int(cls_pipeline.model.num_parameters() / 1E6)}M")
predictions = cls_pipeline("This restaurant is awesome")
print(predictions)
⨠Features
- Weight Optimization: The linear layers of the model contain 37% of the original weights, and the model contains 51% of the original weights overall.
- Performance: It achieves an accuracy of 91.17 on the SST - 2 dataset.
- Case - Insensitive: The model does not distinguish between different cases of English words.
đĻ Installation
Install the nn_pruning
library using the following command:
pip install nn_pruning
đģ Usage Examples
Basic Usage
from transformers import pipeline
from nn_pruning.inference_model_patcher import optimize_model
cls_pipeline = pipeline(
"text-classification",
model="echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid",
tokenizer="echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid",
)
print(f"Parameters count (includes only head pruning, no feed forward pruning)={int(cls_pipeline.model.num_parameters() / 1E6)}M")
cls_pipeline.model = optimize_model(cls_pipeline.model, "dense")
print(f"Parameters count after optimization={int(cls_pipeline.model.num_parameters() / 1E6)}M")
predictions = cls_pipeline("This restaurant is awesome")
print(predictions)
đ Documentation
Fine - Pruning details
This model was fine - tuned from the HuggingFace model checkpoint on the task and distilled from the model textattack/bert-base-uncased-SST-2.
A side - effect of the block pruning method is that some of the attention heads are completely removed: 88 heads were removed out of a total of 144 (61.1%).
Details of the SST - 2 dataset
Dataset |
Split |
# samples |
SST - 2 |
train |
67K |
SST - 2 |
eval |
872 |
Results
Pytorch model file size: 351MB
(original BERT: 420MB
)
Metric |
# Value |
# Original (Table 2) |
Variation |
accuracy |
91.17 |
92.7 |
-1.53 |
đ§ Technical Details
The model uses the nn_pruning
python library to optimize the linear layers. The embeddings, which account for a significant part of the model, are not pruned by this method. The block pruning method also affects the attention heads in the model.
đ License
This project is licensed under the Apache - 2.0 license.