Gliclass-base-v1.0 Open-Source Zero-Shot Classifier - Free for Text Classification, Sentiment Analysis, and Re-ranking

Gliclass Base V1.0

Developed by knowledgator

GLiClass is an efficient zero-shot classifier inspired by GLiNER, suitable for text classification, sentiment analysis, and reranking tasks in RAG workflows.

Text Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Zero-shot classification #Efficient single-pass inference #Multi-label classification

Downloads 152

Release Time : 7/3/2024

Model Overview

A general-purpose lightweight sequence classification model supporting zero-shot learning, ideal for multi-label classification tasks with high computational efficiency—only requiring a single forward pass for classification.

Model Features

Efficient zero-shot classification

Maintains performance parity with cross-encoders while being computationally more efficient—classification requires only a single forward pass.

Multi-task applicability

Suitable for topic classification, sentiment analysis, and reranking tasks in RAG workflows.

Synthetic data training

Trained on synthetic data, making it viable for commercial applications.

Model Capabilities

Zero-shot text classification

Multi-label classification

Sentiment analysis

RAG reranking

Use Cases

Text analysis

Topic classification

Performs multi-label topic classification on text, such as identifying themes like travel or dreams.

Achieves F1 score of 0.8650 on IMDB dataset.

Sentiment analysis

Analyzes textual sentiment orientation.

Outperforms some baseline models in sentiment analysis tasks.

Information retrieval

RAG reranking

Functions as a reranker in Retrieval-Augmented Generation workflows.

🚀 ⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification

GLiClass is an efficient zero - shot classifier inspired by the GLiNER work. It achieves performance comparable to a cross - encoder while being more compute - efficient, as classification is completed in a single forward pass. It can be applied to topic classification, sentiment analysis, and used as a reranker in RAG pipelines. The model is trained on synthetic data and is suitable for commercial applications.

🚀 Quick Start

📦 Installation

First, you need to install the GLiClass library:

pip install gliclass

💻 Usage Examples

Basic Usage

Then, you need to initialize a model and a pipeline:

from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-base-v1.0")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-base-v1.0")

pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0] #because we have one text

for result in results:
 print(result["label"], "=>", result["score"])

📚 Documentation

Benchmarks

Below, you can see the F1 score on several text classification datasets. All tested models were not fine - tuned on those datasets and were tested in a zero - shot setting.

Model	IMDB	AG_NEWS	Emotions
gliclass-large-v1.0 (438 M)	0.9404	0.7516	0.4874
gliclass-base-v1.0 (186 M)	0.8650	0.6837	0.4749
gliclass-small-v1.0 (144 M)	0.8650	0.6805	0.4664
Bart-large-mnli (407 M)	0.89	0.6887	0.3765
Deberta-base-v3 (184 M)	0.85	0.6455	0.5095
Comprehendo (184M)	0.90	0.7982	0.5660
SetFit BAAI/bge-small-en-v1.5 (33.4M)	0.86	0.5636	0.5754

Below you can find a comparison with other GLiClass models:

Dataset	gliclass-small-v1.0-lw	gliclass-base-v1.0-lw	gliclass-large-v1.0-lw	gliclass-small-v1.0	gliclass-base-v1.0	gliclass-large-v1.0
CR	0.8886	0.9097	0.9226	0.8824	0.8942	0.9219
sst2	0.8392	0.8987	0.9247	0.8518	0.8979	0.9269
sst5	0.2865	0.3779	0.2891	0.2424	0.2789	0.3900
20_news_groups	0.4572	0.3953	0.4083	0.3366	0.3576	0.3863
spam	0.5118	0.5126	0.3642	0.4089	0.4938	0.3661
rotten_tomatoes	0.8015	0.8429	0.8807	0.7987	0.8508	0.8808
massive	0.3180	0.4635	0.5606	0.2546	0.1893	0.4376
banking	0.1768	0.4396	0.3317	0.1374	0.2077	0.2847
yahoo_topics	0.4686	0.4784	0.4760	0.4477	0.4516	0.4921
financial_phrasebank	0.8665	0.8880	0.9044	0.8901	0.8955	0.8735
imdb	0.9048	0.9351	0.9429	0.8982	0.9238	0.9333
ag_news	0.7252	0.6985	0.7559	0.7242	0.6848	0.7503
dair_emotion	0.4012	0.3516	0.3951	0.3450	0.2357	0.4013
capsotu	0.3794	0.4643	0.4749	0.3432	0.4375	0.4644
Average:	0.5732	0.6183	0.6165	0.5401	0.5571	0.6078

Here you can see how the performance of the model grows providing more examples:

Model	Num Examples	sst5	spam	massive	banking	ag news	dair emotion	capsotu	Average
gliclass-small-v1.0-lw	0	0.2865	0.5118	0.318	0.1768	0.7252	0.4012	0.3794	0.3998428571
gliclass-base-v1.0-lw	0	0.3779	0.5126	0.4635	0.4396	0.6985	0.3516	0.4643	0.4725714286
gliclass-large-v1.0-lw	0	0.2891	0.3642	0.5606	0.3317	0.7559	0.3951	0.4749	0.4530714286
gliclass-small-v1.0	0	0.2424	0.4089	0.2546	0.1374	0.7242	0.345	0.3432	0.3508142857
gliclass-base-v1.0	0	0.2789	0.4938	0.1893	0.2077	0.6848	0.2357	0.4375	0.3611
gliclass-large-v1.0	0	0.39	0.3661	0.4376	0.2847	0.7503	0.4013	0.4644	0.4420571429
gliclass-small-v1.0-lw	8	0.2709	0.84026	0.62	0.6883	0.7786	0.449	0.4918	0.5912657143
gliclass-base-v1.0-lw	8	0.4275	0.8836	0.729	0.7667	0.7968	0.3866	0.4858	0.6394285714
gliclass-large-v1.0-lw	8	0.3345	0.8997	0.7658	0.848	0.84843	0.5219	0.508	0.67519
gliclass-small-v1.0	8	0.3042	0.5683	0.6332	0.7072	0.759	0.4509	0.4434	0.5523142857
gliclass-base-v1.0	8	0.3387	0.7361	0.7059	0.7456	0.7896	0.4323	0.4802	0.6040571429
gliclass-large-v1.0	8	0.4365	0.9018	0.77	0.8533	0.8509	0.5061	0.4935	0.6874428571

📄 License

The model is licensed under the Apache - 2.0 license.

✨ Features

Efficient Zero - Shot Classification: Achieves comparable performance to cross - encoders with lower computational cost.
Multiple Applications: Can be used for topic classification, sentiment analysis, and as a reranker in RAG pipelines.
Commercial Use: Trained on synthetic data, suitable for commercial applications.

📦 Information Table

Property	Details
Model Type	Zero - shot classifier
Training Data	MoritzLaurer/synthetic_zeroshot_mixtral_v0.1
Metrics	F1
Pipeline Tag	Zero - shot classification
Tags	text classification, zero - shot, small language models, RAG, sentiment analysis

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご