BERT Mini Finetuned MNLI Open - Source Text Classification Model: Free to Facilitate Precise Text Classification

Bert Mini Finetuned Mnli

Developed by M-FAC

This model is based on the BERT-mini architecture, fine-tuned on the MNLI dataset using the M-FAC second-order optimizer for text classification tasks.

Text Classification

Transformers

#Second-order optimization fine-tuning #MNLI textual reasoning #Efficient matrix approximation

Downloads 290.56k

Release Time : 3/2/2022

Model Overview

The model is primarily designed for natural language inference tasks, with performance improvements on the MNLI dataset achieved through the M-FAC optimizer.

Model Features

M-FAC second-order optimization

Fine-tuned using the advanced M-FAC second-order optimizer, demonstrating performance improvements over traditional Adam optimizer.

Lightweight architecture

Based on the BERT-mini architecture with fewer parameters, making it suitable for resource-constrained environments.

Robust performance

Exhibits stable performance across multiple runs with small standard deviations.

Model Capabilities

Text classification

Natural language inference

Use Cases

Text understanding

Natural language inference

Determining the relationship between two sentences (entailment, contradiction, or neutral)

Achieves approximately 75% accuracy on the MNLI validation set.

🚀 BERT-mini model finetuned with M-FAC

This project presents a BERT-mini model finetuned with the state-of-the-art second-order optimizer M-FAC on the MNLI dataset, offering enhanced performance.

🚀 Quick Start

This model is finetuned on the MNLI dataset using the state-of-the-art second-order optimizer M-FAC. For more details about M-FAC, refer to the NeurIPS 2021 paper: https://arxiv.org/pdf/2107.03356.pdf.

✨ Features

The model is finetuned using M-FAC, a second - order optimizer, which may lead to better performance compared to traditional optimizers.
The finetuning setup is designed for a fair comparison with the default Adam baseline.

📦 Installation

There is no specific installation step provided in the original README. So, this section is skipped.

💻 Usage Examples

Basic Usage

To reproduce the results, you need to add the M-FAC optimizer code in https://github.com/huggingface/transformers/blob/master/examples/pytorch/text-classification/run_glue.py and run the following bash script:

CUDA_VISIBLE_DEVICES=0 python run_glue.py \
  --seed 8276 \
  --model_name_or_path prajjwal1/bert-mini \
  --task_name mnli \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 1e-4 \
  --num_train_epochs 5 \
  --output_dir out_dir/ \
  --optim MFAC \
  --optim_args '{"lr": 1e-4, "num_grads": 1024, "damp": 1e-6}'

📚 Documentation

Finetuning setup

For a fair comparison against the default Adam baseline, we finetune the model in the same framework as described here https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification and just swap the Adam optimizer with M-FAC. The hyperparameters used by the M-FAC optimizer are:

learning rate = 1e-4
number of gradients = 1024
dampening = 1e-6

Results

We share the best model out of 5 runs with the following score on the MNLI validation set:

matched_accuracy = 75.13
mismatched_accuracy = 75.93

The mean and standard deviation for 5 runs on the MNLI validation set are as follows:

Property	Matched Accuracy	Mismatched Accuracy
Adam	73.30 ± 0.20	74.85 ± 0.09
M-FAC	74.59 ± 0.41	75.95 ± 0.14

We believe these results could be improved with modest tuning of hyperparameters: per_device_train_batch_size, learning_rate, num_train_epochs, num_grads and damp. For the sake of fair comparison and a robust default setup, we use the same hyperparameters across all models (bert-tiny, bert-mini) and all datasets (SQuAD version 2 and GLUE).

Our code for M-FAC can be found here: https://github.com/IST-DASLab/M-FAC. A step-by-step tutorial on how to integrate and use M-FAC with any repository can be found here: https://github.com/IST-DASLab/M-FAC/tree/master/tutorials.

📄 License

There is no license information provided in the original README. So, this section is skipped.

🔧 Technical Details

There is no specific technical details section with more than 50 - word description in the original README. So, this section is skipped.

BibTeX entry and citation info

@article{frantar2021m,
  title={M-FAC: Efficient Matrix-Free Approximations of Second-Order Information},
  author={Frantar, Elias and Kurtic, Eldar and Alistarh, Dan},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご