BERT-Tiny-Finetuned-MRPC Open-Source Text Classification Model - Free Deployment for Precise Text Classification

Bert Tiny Finetuned Mrpc

Developed by M-FAC

This model is based on the BERT-tiny architecture and fine-tuned for text classification on the MRPC dataset using the M-FAC second-order optimizer.

Text Classification

Transformers

#Second-order optimization fine-tuning #Text semantic matching #Lightweight BERT

Downloads 46

Release Time : 3/2/2022

Model Overview

The model is primarily designed for sentence pair classification tasks, specifically optimized for performance on the MRPC (Microsoft Research Paraphrase Corpus) dataset.

Model Features

M-FAC second-order optimization

Utilizes the advanced M-FAC second-order optimizer for fine-tuning, demonstrating superior performance compared to traditional Adam optimizer.

Lightweight architecture

Based on the BERT-tiny architecture with fewer parameters, making it suitable for resource-constrained environments.

Robust performance

Exhibits stable performance across multiple runs with small standard deviations.

Model Capabilities

Text classification

Sentence similarity judgment

Semantic equivalence detection

Use Cases

Natural Language Processing

Text paraphrase detection

Determines whether two sentences are paraphrases of each other

Achieves F1 score of 83.12 on MRPC dataset

Semantic similarity analysis

Evaluates the semantic similarity between two sentences

🚀 BERT-tiny model finetuned with M-FAC

This is a BERT-tiny model finetuned on the MRPC dataset using the state-of-the-art second-order optimizer M-FAC, offering enhanced performance for text classification tasks.

🚀 Quick Start

This model is finetuned on the MRPC dataset with the state-of-the-art second-order optimizer M-FAC. For more details on M-FAC, check the NeurIPS 2021 paper: https://arxiv.org/pdf/2107.03356.pdf.

✨ Features

Advanced Optimization: Utilizes the M-FAC optimizer for finetuning, which is a state-of-the-art second-order optimization method.
Fair Comparison Setup: Finetuned in the same framework as the default Adam baseline for fair performance comparison.

📦 Installation

No specific installation steps are provided in the original README. If you want to use this model, you may need to refer to the related repositories:

Hugging Face Transformers: https://github.com/huggingface/transformers
M-FAC Code: https://github.com/IST-DASLab/M-FAC

💻 Usage Examples

Finetuning setup

For fair comparison against the default Adam baseline, we finetune the model in the same framework as described here https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification and just swap the Adam optimizer with M-FAC.

Hyperparameters used by the M-FAC optimizer:

learning rate = 1e-4
number of gradients = 512
dampening = 1e-6

Results Reproduction

The results can be reproduced by adding the M-FAC optimizer code in https://github.com/huggingface/transformers/blob/master/examples/pytorch/text-classification/run_glue.py and running the following bash script:

CUDA_VISIBLE_DEVICES=0 python run_glue.py \
  --seed 42 \
  --model_name_or_path prajjwal1/bert-tiny \
  --task_name mrpc \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 1e-4 \
  --num_train_epochs 5 \
  --output_dir out_dir/ \
  --optim MFAC \
  --optim_args '{"lr": 1e-4, "num_grads": 512, "damp": 1e-6}'

📚 Documentation

Results

We share the best model out of 5 runs with the following score on the MRPC validation set:

f1 = 83.12
accuracy = 73.52

Mean and standard deviation for 5 runs on the MRPC validation set:

	F1	Accuracy
Adam	81.68 ± 0.33	69.90 ± 0.32
M-FAC	82.77 ± 0.22	72.94 ± 0.37

We believe these results could be improved with modest tuning of hyperparameters: per_device_train_batch_size, learning_rate, num_train_epochs, num_grads and damp. For the sake of fair comparison and a robust default setup, we use the same hyperparameters across all models (bert-tiny, bert-mini) and all datasets (SQuAD version 2 and GLUE).

📄 License

No license information is provided in the original README.

📚 Citation

@article{frantar2021m,
  title={M-FAC: Efficient Matrix-Free Approximations of Second-Order Information},
  author={Frantar, Elias and Kurtic, Eldar and Alistarh, Dan},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご