Open-source model flan-t5-xxl-classif-3way: Empowering natural language inference for more efficient zero-shot classification

Flan T5 Xxl Classif 3way

Developed by AntoineBlanot

A sequence classification model adapted from Flan-T5-xxl, specifically designed for natural language inference tasks with half the parameters and support for zero-shot classification

Text Classification

Transformers

English#Zero-shot classification #Efficient T5 architecture #Natural language inference

Downloads 26

Release Time : 5/11/2023

Model Overview

By removing the original T5 architecture's decoder and optimizing the classification head, this model achieves efficient sequence classification capabilities, particularly suitable for natural language inference tasks

Model Features

Half the parameters

Removes the original T5's decoder structure, retaining only the encoder + single-layer decoder, reducing computational costs by 50%

Zero-shot classification

Supports direct application to various text classification tasks without fine-tuning

Efficient inference

Avoids the text generation delay of generative models, enabling faster predictions

Model Capabilities

Natural language inference

Text classification

Zero-shot learning

Use Cases

Natural Language Understanding

Topic classification

Automatically classify text

Intent recognition

Identify the intent behind user queries

Sentiment analysis

Analyze the sentiment tendency of text

Academic Research

Scientific literature reasoning

Analyze the logical relationship between scientific hypotheses and evidence

96.6% accuracy on the SciTail dataset

🚀 T5ForSequenceClassification

T5ForSequenceClassification adapts the original T5 architecture for sequence classification tasks. T5, initially designed for text - to - text tasks, can handle any NLP task when converted to a text - to - text format, including sequence classification. By removing the decoder, this model halves the original number of parameters and efficiently optimizes for sequence classification.

🚀 Quick Start

T5ForSequenceClassification supports zero - shot classification tasks. It can be directly used for:

Topic classification
Intent recognition
Boolean question answering
Sentiment analysis
And any other text classification tasks.

Since the T5ForClassification class is not currently supported by the transformers library, you can't directly use this model on the Hub. To use T5ForSequenceClassification, you need to install additional packages and model weights. You can find instructions [here](https://github.com/AntoineBlanot/zero - nlp).

✨ Features

Why use T5ForSequenceClassification?

Models based on the [BERT](https://huggingface.co/bert - large - uncased) architecture, such as [RoBERTa](https://huggingface.co/roberta - large) and [DeBERTa](https://huggingface.co/microsoft/deberta - v2 - xxlarge), perform well on sequence classification tasks but have a limited number of parameters (up to ~1.5B). In contrast, models based on the T5 architecture can scale up to ~11B parameters, and recent innovations in this architecture are continuously improving.

T5ForClassification vs T5

T5ForClassification Architecture:

Encoder: same as the original T5
Decoder: only the first layer (for pooling purpose)
Classification head: a simple Linear layer on top of the decoder

Benefits and Drawbacks:

(+) Retains T5's encoding strength
(+) Halves the parameter size
(+) Provides interpretable outputs (class logits)
(+) Avoids generation mistakes and has faster prediction (no generation latency)
(-) Loses text - to - text ability

🔧 Technical Details

T5 was originally built for text - to - text tasks and excels in it. It can handle any NLP task if it has been converted to a text - to - text format, including sequence classification task! You can find [here](https://huggingface.co/google/flan - t5 - base?text=Premise%3A++At+my+age+you+will+probably+have+learnt+one+lesson.+Hypothesis%3A++It%27s+not+certain+how+many+lessons+you%27ll+learn+by+your+thirties.+Does+the+premise+entail+the+hypothesis%3F) how the original T5 is used for sequence classification task.

Our motivation for building T5ForSequenceClassification is that the full original T5 architecture is not needed for most NLU tasks. Indeed, NLU tasks generally do not require text generation, so a large decoder is unnecessary. By removing the decoder, we can half the original number of parameters (thus half the computation cost) and efficiently optimize the network for the given task.

📄 Results

Results on the validation data of training tasks

Dataset	Accuracy	F1
MNLI (m)	0.923	0.923
MNLI (mm)	0.922	0.922
SNLI	0.942	0.942
SciTail	0.966	0.647

Results on validation data of unseen tasks (zero - shot)

Dataset	Accuracy	F1
?	?	?

📄 Acknowledgments

Special thanks to philschmid for making a Flan - T5 - xxl [checkpoint](https://huggingface.co/philschmid/flan - t5 - xxl - sharded - fp16) in fp16.

📦 Dataset and Metrics

Property	Details
Datasets	multi_nli, snli, scitail
Metrics	accuracy, f1
Pipeline Tag	zero - shot - classification
Language	en
Model Index	AntoineBlanot/flan - t5 - xxl - classif - 3way

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご