đ SetFit with sentence-transformers/all-MiniLM-L6-v2 on sst2
This is a SetFit model trained on the sst2 dataset for text classification, leveraging sentence-transformers/all-MiniLM-L6-v2 and LogisticRegression.
đ Quick Start
This is a SetFit model trained on the sst2 dataset that can be used for Text Classification. This SetFit model uses sentence-transformers/all-MiniLM-L6-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
⨠Features
- Few-shot Learning: Trained with an efficient few-shot learning technique, reducing the need for large amounts of training data.
- Text Classification: Specifically designed for text classification tasks on the sst2 dataset.
- Sentence Transformer: Utilizes sentence-transformers/all-MiniLM-L6-v2 for embedding.
- Logistic Regression: Employs LogisticRegression for classification.
đĻ Installation
First install the SetFit library:
pip install setfit
đģ Usage Examples
Basic Usage
from setfit import SetFitModel
model = SetFitModel.from_pretrained("tomaarsen/setfit-all-MiniLM-L6-v2-sst2-8-shot")
preds = model("a fast , funny , highly enjoyable movie . ")
đ Documentation
Model Details
Model Description
Model Sources
Model Labels
Label |
Examples |
negative |
- 'a tough pill to swallow and '
- 'indignation '
- 'that the typical hollywood disregard for historical truth and realism is at work here '
|
positive |
- "a moving experience for people who have n't read the book "
- 'in the best possible senses of both those words '
- 'to serve the work especially well '
|
Evaluation
Metrics
Label |
Accuracy |
all |
0.7513 |
Training Details
Training Set Metrics
Training set |
Min |
Median |
Max |
Word count |
2 |
10.2812 |
36 |
Label |
Training Sample Count |
negative |
32 |
positive |
32 |
Training Hyperparameters
- batch_size: (16, 16)
- num_epochs: (3, 3)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- seed: 42
- load_best_model_at_end: True
Training Results
Epoch |
Step |
Training Loss |
Validation Loss |
0.0076 |
1 |
0.3787 |
- |
0.0758 |
10 |
0.2855 |
- |
0.1515 |
20 |
0.3458 |
0.29 |
0.2273 |
30 |
0.2496 |
- |
0.3030 |
40 |
0.2398 |
0.2482 |
0.3788 |
50 |
0.2068 |
- |
0.4545 |
60 |
0.2471 |
0.244 |
0.5303 |
70 |
0.2053 |
- |
0.6061 |
80 |
0.1802 |
0.2361 |
0.6818 |
90 |
0.0767 |
- |
0.7576 |
100 |
0.0279 |
0.2365 |
0.8333 |
110 |
0.0192 |
- |
0.9091 |
120 |
0.0095 |
0.2527 |
0.9848 |
130 |
0.0076 |
- |
1.0606 |
140 |
0.0082 |
0.2651 |
1.1364 |
150 |
0.0068 |
- |
1.2121 |
160 |
0.0052 |
0.2722 |
1.2879 |
170 |
0.0029 |
- |
1.3636 |
180 |
0.0042 |
0.273 |
1.4394 |
190 |
0.0026 |
- |
1.5152 |
200 |
0.0036 |
0.2761 |
1.5909 |
210 |
0.0044 |
- |
1.6667 |
220 |
0.0027 |
0.2796 |
1.7424 |
230 |
0.0025 |
- |
1.8182 |
240 |
0.0025 |
0.2817 |
1.8939 |
250 |
0.003 |
- |
1.9697 |
260 |
0.0026 |
0.2817 |
2.0455 |
270 |
0.0035 |
- |
2.1212 |
280 |
0.002 |
0.2816 |
2.1970 |
290 |
0.0023 |
- |
2.2727 |
300 |
0.0016 |
0.2821 |
2.3485 |
310 |
0.0023 |
- |
2.4242 |
320 |
0.0015 |
0.2838 |
2.5 |
330 |
0.0014 |
- |
2.5758 |
340 |
0.002 |
0.2842 |
2.6515 |
350 |
0.002 |
- |
2.7273 |
360 |
0.0013 |
0.2847 |
2.8030 |
370 |
0.0009 |
- |
2.8788 |
380 |
0.0018 |
0.2857 |
2.9545 |
390 |
0.0016 |
- |
- The bold row denotes the saved checkpoint.
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Carbon Emitted: 0.003 kg of CO2
- Hours Used: 0.072 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.9.16
- SetFit: 1.0.0.dev0
- Sentence Transformers: 2.2.2
- Transformers: 4.29.0
- PyTorch: 1.13.1+cu117
- Datasets: 2.15.0
- Tokenizers: 0.13.3
đ§ Technical Details
The model uses an efficient few-shot learning technique. First, it fine-tunes a Sentence Transformer with contrastive learning. Then, it trains a classification head with features from the fine-tuned Sentence Transformer. This approach allows the model to achieve good performance with a relatively small amount of training data.
đ License
This model is licensed under the Apache 2.0 license.
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}