zeroshot_selectra_medium Open-source Classifier - Free Deployment for Zero-shot Spanish Classification

Zeroshot Selectra Medium

Developed by Recognai

Spanish zero-shot classifier fine-tuned on SELECTRA model with 41M parameters, achieving 0.807 accuracy on XNLI dataset

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Spanish Zero-shot Classification #Lightweight NLI Model #XNLI Fine-tuning

Downloads 998

Release Time : 3/2/2022

Model Overview

For zero-shot classification of Spanish texts, supports custom label classification without fine-tuning

Model Features

Lightweight Design

3x fewer parameters than BETO version while delivering better performance

Zero-shot Learning

Classify new labels without fine-tuning

Spanish Optimization

Specially optimized for Spanish text classification tasks

Model Capabilities

Spanish Text Classification

Zero-shot Learning

Multi-label Classification

Use Cases

News Classification

Spanish News Classification

Automatic classification of Spanish news articles

0.589 accuracy on MLSUM dataset

Content Analysis

Social Media Content Classification

Classification of Spanish social media content

🚀 Zero-shot SELECTRA: A zero-shot classifier based on SELECTRA

Zero-shot SELECTRA is a SELECTRA model fine-tuned on the Spanish portion of the XNLI dataset. It can be used with Hugging Face's Zero-shot pipeline to perform zero-shot classifications.

Compared to our previous zero-shot classifier based on BETO, zero-shot SELECTRA is much more lightweight. As shown in the Metrics section, the small version (with 5 times fewer parameters) performs slightly worse, while the medium version (with 3 times fewer parameters) outperforms the BETO-based zero-shot classifier.

🚀 Quick Start

Get started with the Zero-shot SELECTRA classifier right away.

✨ Features

Lightweight: Significantly reduces the number of parameters compared to the previous BETO-based classifier.
High Performance: The medium version outperforms the BETO-based zero-shot classifier.
Zero-shot Classification: Enables classifications without the need for task-specific training data.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import pipeline
classifier = pipeline("zero-shot-classification", 
                       model="Recognai/zeroshot_selectra_medium")

classifier(
    "El autor se perfila, a los 50 años de su muerte, como uno de los grandes de su siglo",
    candidate_labels=["cultura", "sociedad", "economia", "salud", "deportes"],
    hypothesis_template="Este ejemplo es {}."
)
"""Output
{'sequence': 'El autor se perfila, a los 50 años de su muerte, como uno de los grandes de su siglo',
 'labels': ['sociedad', 'cultura', 'economia', 'salud', 'deportes'],
 'scores': [0.6450043320655823,
  0.16710571944713593,
  0.08507631719112396,
  0.0759836807847023,
  0.026829993352293968]}
"""

⚠️ Important Note

The hypothesis_template parameter is important and should be in Spanish. In the widget on the right, this parameter is set to its default value: "This example is {}.", so different results are expected.

📚 Documentation

Demo and tutorial

If you want to see this model in action, we have created a basic tutorial using Rubrix, a free and open-source tool to explore, annotate, and monitor data for NLP.

The tutorial shows you how to evaluate this classifier for news categorization in Spanish, and how it could be used to build a training set for training a supervised classifier (which might be useful if you want obtain more precise results or improve the model over time).

You can find the tutorial here.

See the video below showing the predictions within the annotation process (see that the predictions are almost correct for every example).

Metrics

Property	Details
Model Type	Zero-shot SELECTRA is a fine-tuned SELECTRA model for zero-shot classification.
Training Data	Fine-tuned on the Spanish portion of the XNLI dataset.

Model	Params	XNLI (acc)	*MLSUM (acc)
zs BETO	110M	0.799	0.530
zs SELECTRA medium	41M	0.807	0.589
zs SELECTRA small	22M	0.795	0.446

*evaluated with zero-shot learning (ZSL)

XNLI: The stated accuracy refers to the test portion of the XNLI dataset, after finetuning the model on the training portion.
MLSUM: For this accuracy we take the test set of the MLSUM dataset and classify the summaries of 5 selected labels. For details, check out our evaluation notebook

Training

Check out our training notebook for all the details.

Authors

David Fidalgo (GitHub)
Daniel Vila (GitHub)
Francisco Aranda (GitHub)
Javier Lopez (GitHub)

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご