BERT Mini Finetuned SST2 Open-Source Model - Free for Text Classification Tasks, Precise and Efficient

Bert Mini Finetuned Sst2

Developed by M-FAC

This model is a BERT-mini model fine-tuned on the SST-2 dataset using the M-FAC second-order optimizer for text classification tasks.

Text Classification

Transformers

#Second-order optimization fine-tuning #Text sentiment analysis #Efficient matrix approximation

Downloads 13.90k

Release Time : 3/2/2022

Model Overview

This model is based on the BERT-mini architecture and fine-tuned on the SST-2 sentiment analysis dataset using the M-FAC optimizer. Its primary use is for sentence-level sentiment classification.

Model Features

M-FAC second-order optimization

Utilizes the advanced M-FAC second-order optimizer for fine-tuning, potentially offering better convergence properties compared to traditional Adam optimizer.

Lightweight architecture

Based on the BERT-mini architecture, maintaining good performance while having a smaller model size.

Reproducibility

Provides complete training configurations and parameter settings to ensure reproducible results.

Model Capabilities

Text classification

Sentiment analysis

Sentence-level semantic understanding

Use Cases

Sentiment analysis

Product review sentiment classification

Analyze whether user reviews of products are positive or negative.

Achieved 84.74% accuracy on the SST-2 validation set.

Social media sentiment monitoring

Monitor the sentiment tendencies of users on social media regarding specific topics.

🚀 BERT-mini model finetuned with M-FAC

This project presents a BERT - mini model fine - tuned using the M - FAC optimizer on the SST - 2 dataset, offering a new approach to text classification.

🚀 Quick Start

This model is finetuned on the SST - 2 dataset with the state - of - the - art second - order optimizer M - FAC. For more details on M - FAC, check the NeurIPS 2021 paper: https://arxiv.org/pdf/2107.03356.pdf.

✨ Features

The model is fine - tuned with the advanced M - FAC optimizer, which provides a new perspective for model training.
The fine - tuning setup ensures a fair comparison with the default Adam baseline.

📦 Installation

There is no specific installation process described in the original document.

💻 Usage Examples

Basic Usage

To reproduce the results, you need to add the M - FAC optimizer code in [https://github.com/huggingface/transformers/blob/master/examples/pytorch/text - classification/run_glue.py](https://github.com/huggingface/transformers/blob/master/examples/pytorch/text - classification/run_glue.py) and run the following bash script:

CUDA_VISIBLE_DEVICES=0 python run_glue.py \
  --seed 1234 \
  --model_name_or_path prajjwal1/bert - mini \
  --task_name sst2 \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 1e - 4 \
  --num_train_epochs 3 \
  --output_dir out_dir/ \
  --optim MFAC \
  --optim_args '{"lr": 1e - 4, "num_grads": 1024, "damp": 1e - 6}'

📚 Documentation

Finetuning setup

For a fair comparison against the default Adam baseline, we finetune the model in the same framework as described here [https://github.com/huggingface/transformers/tree/master/examples/pytorch/text - classification](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text - classification) and just swap the Adam optimizer with M - FAC. The hyperparameters used by the M - FAC optimizer are as follows:

learning rate = 1e-4
number of gradients = 1024
dampening = 1e-6

Results

We share the best model out of 5 runs with the following score on the SST - 2 validation set:

accuracy = 84.74

The mean and standard deviation for 5 runs on the SST - 2 validation set are shown in the table below:

Property	Details
Adam Accuracy	85.46 ± 0.58
M - FAC Accuracy	84.20 ± 0.58

We believe these results could be improved with modest tuning of hyperparameters: per_device_train_batch_size, learning_rate, num_train_epochs, num_grads and damp. For the sake of fair comparison and a robust default setup, we use the same hyperparameters across all models (bert - tiny, bert - mini) and all datasets (SQuAD version 2 and GLUE).

Our code for M - FAC can be found here: [https://github.com/IST - DASLab/M - FAC](https://github.com/IST - DASLab/M - FAC). A step - by - step tutorial on how to integrate and use M - FAC with any repository can be found here: [https://github.com/IST - DASLab/M - FAC/tree/master/tutorials](https://github.com/IST - DASLab/M - FAC/tree/master/tutorials).

📄 License

There is no license information in the original document.

🔧 Technical Details

The citation information for the M - FAC optimizer is as follows:

@article{frantar2021m,
  title={M - FAC: Efficient Matrix - Free Approximations of Second - Order Information},
  author={Frantar, Elias and Kurtic, Eldar and Alistarh, Dan},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご