Bert-tiny-finetuned-sst2 Open-source Text Classification Model - Free and Efficient for Text Classification Tasks

Bert Tiny Finetuned Sst2

Developed by M-FAC

This model is based on the BERT-tiny architecture and fine-tuned on the SST-2 dataset using the M-FAC second-order optimizer for text classification tasks.

Text Classification

Transformers

#Second-order optimization fine-tuning #Text sentiment analysis #Efficient matrix approximation

Downloads 59

Release Time : 3/2/2022

Model Overview

This model is primarily used for text sentiment analysis tasks and performs excellently on the SST-2 dataset. It employs the advanced M-FAC second-order optimizer for fine-tuning, showing performance improvements over the traditional Adam optimizer.

Model Features

M-FAC second-order optimization

Fine-tuned using the M-FAC second-order optimizer, demonstrating better performance compared to the traditional Adam optimizer.

Lightweight architecture

Based on the BERT-tiny architecture, the model is compact and suitable for resource-constrained environments.

Stable performance

Exhibits small standard deviation across multiple runs, indicating stable performance.

Model Capabilities

Text classification

Sentiment analysis

Use Cases

Sentiment analysis

Movie review sentiment analysis

Analyze the sentiment tendency (positive/negative) of movie reviews

Achieved 83.02% accuracy on the SST-2 validation set

Product review classification

Perform sentiment classification on e-commerce product reviews

🚀 BERT-tiny model finetuned with M-FAC

This project presents a BERT-tiny model that has been finetuned using the state-of-the-art second-order optimizer, M-FAC, on the SST-2 dataset. It offers a more efficient way for text classification tasks.

🚀 Quick Start

Prerequisites

To reproduce the results, you need to set up the environment as described in https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification.

Installation

You can install the necessary dependencies by following the instructions in the above repository.

Usage

To run the model, you can use the following bash script:

CUDA_VISIBLE_DEVICES=0 python run_glue.py \
  --seed 42 \
  --model_name_or_path prajjwal1/bert-tiny \
  --task_name sst2 \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 1e-4 \
  --num_train_epochs 3 \
  --output_dir out_dir/ \
  --optim MFAC \
  --optim_args '{"lr": 1e-4, "num_grads": 1024, "damp": 1e-6}'

✨ Features

Advanced Optimizer: Utilizes the M-FAC optimizer, a state-of-the-art second-order optimizer, which can potentially lead to better performance compared to traditional optimizers like Adam.
Fair Comparison: The model is finetuned in the same framework as the default Adam baseline, ensuring a fair comparison.
Reproducible Results: The detailed hyperparameters and running scripts are provided, allowing users to reproduce the results.

📦 Installation

The installation process is based on the framework described in https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification. You just need to swap the Adam optimizer with M-FAC.

Hyperparameters for M-FAC

learning rate = 1e-4
number of gradients = 1024
dampening = 1e-6

💻 Usage Examples

Basic Usage

To reproduce the results, you can use the following bash script:

CUDA_VISIBLE_DEVICES=0 python run_glue.py \
  --seed 42 \
  --model_name_or_path prajjwal1/bert-tiny \
  --task_name sst2 \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 1e-4 \
  --num_train_epochs 3 \
  --output_dir out_dir/ \
  --optim MFAC \
  --optim_args '{"lr": 1e-4, "num_grads": 1024, "damp": 1e-6}'

Advanced Usage

You can adjust the hyperparameters such as per_device_train_batch_size, learning_rate, num_train_epochs, num_grads and damp to potentially improve the performance.

📚 Documentation

For more details on the M-FAC optimizer, please check the NeurIPS 2021 paper: https://arxiv.org/pdf/2107.03356.pdf.

🔧 Technical Details

The model is finetuned on the SST-2 dataset. For fair comparison, it follows the same framework as the default Adam baseline, only replacing the Adam optimizer with M-FAC.

📄 License

No license information is provided in the original document.

Results

Best Model Score

We share the best model out of 5 runs with the following score on SST-2 validation set:

accuracy = 83.02

Mean and Standard Deviation

Property	Details
Adam	80.11 ± 0.65
M-FAC	81.86 ± 0.76

BibTeX Entry

@article{frantar2021m,
  title={M-FAC: Efficient Matrix-Free Approximations of Second-Order Information},
  author={Frantar, Elias and Kurtic, Eldar and Alistarh, Dan},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご