SetFit Model - paraphrase-MiniLM-L6-v2 Open-source Model - Free for Few-shot Learning in Text Classification

Setfit Model Paraphrase MiniLM L6 V2

Developed by hleAtKeeper

This is an efficient few-shot learning model based on SetFit, used for text classification tasks. It uses sentence-transformers/paraphrase-MiniLM-L6-v2 as the sentence embedding model and LogisticRegression for classification.

Text Classification #Few-shot learning #Efficient text classification #Command-line security analysis

Downloads 418

Release Time : 4/15/2025

Model Overview

This model combines the SetFit framework and a pre-trained sentence embedding model, focusing on text classification tasks and is particularly suitable for few-shot learning scenarios.

Model Features

Efficient few-shot learning

It uses a unique contrastive learning technique and can learn efficiently even with a small number of samples.

Accurate classification

It shows high accuracy in text classification tasks (the evaluation accuracy reaches 99.15%).

Two-stage training

First fine-tune the sentence embedding model, and then train the classification head to improve the model performance.

Model Capabilities

Text classification

Few-shot learning

Command statement classification

Use Cases

System command classification

Command risk level classification

Classify the risk levels of Linux system commands (Critical/High/Medium/Low)

Accuracy: 99.15%

🚀 SetFit with sentence-transformers/paraphrase-MiniLM-L6-v2

This is a SetFit model designed for text classification. It leverages the sentence-transformers/paraphrase-MiniLM-L6-v2 as its sentence transformer embedding model and a LogisticRegression instance for classification. The model is trained using an efficient few-shot learning technique, which includes fine-tuning a Sentence Transformer with contrastive learning and training a classification head with features from the fine-tuned Sentence Transformer.

✨ Features

Few-shot Learning: Trained using an efficient few-shot learning technique.
Text Classification: Capable of performing text classification tasks.

📦 Installation

First, install the SetFit library:

pip install setfit

💻 Usage Examples

Basic Usage

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("systemctl stop apache2")

📚 Documentation

Model Details

Model Description

Property	Details
Model Type	SetFit
Sentence Transformer body	sentence-transformers/paraphrase-MiniLM-L6-v2
Classification head	a LogisticRegression instance
Maximum Sequence Length	128 tokens
Number of Classes	4 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
Medium	'chmod 777 /tmp' 'nmap -p 22,80,443 192.168.1.1' "grep -r 'root' /etc"
Low	'reboot' 'apt-get update' 'cd /home/user'
Critical	'history -c' "echo 'export HISTFILE=/dev/null' >> ~/.bashrc" "ssh-keygen -t rsa -f ~/.ssh/id_rsa -q -N ''"
High	"echo 'export HISTFILE=/dev/null' >> ~/.bashrc" 'bash /tmp/malicious.sh' 'bash /tmp/exploit.sh'

Evaluation

Metrics

Label	Accuracy
all	0.9915

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	1	3.1356	11

Label	Training Sample Count
Low	42
Medium	17
High	40
Critical	19

Training Hyperparameters

batch_size: (16, 16)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: True

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0016	1	0.4702	-
0.0806	50	0.2501	-
0.1613	100	0.1859	-
0.2419	150	0.1318	-
0.3226	200	0.1157	-
0.4032	250	0.095	-
0.4839	300	0.0902	-
0.5645	350	0.0796	-
0.6452	400	0.0663	-
0.7258	450	0.0539	-
0.8065	500	0.045	-
0.8871	550	0.0378	-
0.9677	600	0.0332	-
1.0	620	-	0.1862

Framework Versions

Python: 3.13.2
SetFit: 1.1.2
Sentence Transformers: 4.0.2
Transformers: 4.51.0
PyTorch: 2.6.0
Datasets: 3.5.0
Tokenizers: 0.21.1

📄 License

Citation

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご