DistilBERT Open-Source Text Classification Model - Nearly Matching BERT's Performance with Small Size and Fast Inference Experience

Distilbert Base Uncased Mnli

Developed by typeform

DistilBERT is a distilled version of BERT that retains 97% of BERT's performance while being 40% smaller and 60% faster.

Large Language Model

Transformers

English#Zero-shot classification #Multilingual understanding #Lightweight BERT

Downloads 74.81k

Release Time : 3/2/2022

Model Overview

DistilBERT is a lightweight model based on BERT, trained using knowledge distillation techniques, suitable for various natural language processing tasks.

Model Features

Lightweight and Efficient

40% smaller in size and 60% faster in inference compared to the original BERT model

High Performance

Retains 97% of the performance of the BERT model

Multi-task Support

Suitable for various natural language processing tasks

Model Capabilities

Text classification

Zero-shot classification

Natural language understanding

Use Cases

Text Analysis

Sentiment Analysis

Analyze the sentiment tendency of text

High-accuracy sentiment classification

Topic Classification

Classify text into predefined categories

Customer Service

Intent Recognition

Identify the intent of user queries

🚀 DistilBERT base model (uncased)

A zero-shot classification model fine-tuned on the Multi-Genre Natural Language Inference dataset.

🚀 Quick Start

To get started with the model, you can use the following code:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("typeform/distilbert-base-uncased-mnli")

model = AutoModelForSequenceClassification.from_pretrained("typeform/distilbert-base-uncased-mnli")

✨ Features

This model can be used for text classification tasks.

📚 Documentation

🔍 Model Details

Model Description: This is the uncased DistilBERT model fine-tuned on Multi-Genre Natural Language Inference (MNLI) dataset for the zero-shot classification task.
Developed by: The Typeform team.
Model Type: Zero-Shot Classification
Language(s): English
License: Unknown
Parent Model: See the distilbert base uncased model for more information about the Distilled-BERT base model.

Property	Details
Model Type	Zero-Shot Classification
Training Data	Multi-Genre Natural Language Inference (MultiNLI) corpus

🛠️ Uses

This model can be used for text classification tasks.

⚠️ Risks, Limitations and Biases

⚠️ Important Note

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

🏋️ Training

🏋️‍♂️ Training Data

This model of DistilBERT-uncased is pretrained on the Multi-Genre Natural Language Inference (MultiNLI) corpus. It is a crowd-sourced collection of 433k sentence pairs annotated with textual entailment information. The corpus covers a range of genres of spoken and written text, and supports a distinctive cross-genre generalization evaluation. This model is also not case-sensitive, i.e., it does not make a difference between "english" and "English".

🏋️‍♀️ Training Procedure

Training is done on a p3.2xlarge AWS EC2 with the following hyperparameters:

$ run_glue.py \
    --model_name_or_path distilbert-base-uncased \
    --task_name mnli \
    --do_train \
    --do_eval \
    --max_seq_length 128 \
    --per_device_train_batch_size 16 \
    --learning_rate 2e-5 \
    --num_train_epochs 5 \
    --output_dir /tmp/distilbert-base-uncased_mnli/

📊 Evaluation

📈 Evaluation Results

When fine-tuned on downstream tasks, this model achieves the following results:

Epoch = 5.0
Evaluation Accuracy = 0.8206875508543532
Evaluation Loss = 0.8706700205802917
Evaluation Runtime = 17.8278
Evaluation Samples per second = 551.498

MNLI and MNLI-mm results:

Task	MNLI	MNLI-mm
	82.0	82.0

🌱 Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). We present the hardware type based on the associated paper.

Hardware Type: 1 NVIDIA Tesla V100 GPUs
Hours used: Unknown
Cloud Provider: AWS EC2 P3
Compute Region: Unknown
Carbon Emitted: (Power consumption x Time x Carbon produced based on location of power grid): Unknown

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご