🚀 INSTRUCTOR
INSTRUCTOR is a model designed for various text - related tasks such as sentence similarity, text classification, and information retrieval. It provides reliable performance across multiple datasets and tasks.
✨ Features
- Diverse Task Support: Capable of handling tasks like Classification, Retrieval, Clustering, Reranking, and STS.
- Multiple Datasets: Tested on a wide range of datasets including MTEB AmazonCounterfactualClassification, MTEB AmazonPolarityClassification, and many others.
- Rich Metrics: Evaluated using metrics such as accuracy, ap, f1, map, mrr, etc.
📄 License
The model is licensed under the apache - 2.0
license.
📚 Documentation
Model Information
Property |
Details |
Model Name |
INSTRUCTOR |
Pipeline Tag |
sentence - similarity |
Tags |
text - embedding, embeddings, information - retrieval, beir, text - classification, language - model, text - clustering, text - semantic - similarity, text - evaluation, prompt - retrieval, text - reranking, sentence - transformers, feature - extraction, sentence - similarity, transformers, t5, English, Sentence Similarity, natural_questions, ms_marco, fever, hotpot_qa, mteb |
Language |
en |
Inference |
false |
Model Results
Classification Tasks
Dataset |
Accuracy |
AP |
F1 |
MTEB AmazonCounterfactualClassification (en) |
88.13432835820896 |
59.298209334395665 |
83.31769058643586 |
MTEB AmazonPolarityClassification |
91.526375 |
88.16327709705504 |
91.51095801287843 |
MTEB AmazonReviewsClassification (en) |
47.856 |
- |
45.41490917650942 |
MTEB Banking77Classification |
78.51298701298701 |
- |
77.49041754069235 |
Retrieval Tasks
Dataset |
MAP@1 |
MAP@10 |
MAP@100 |
MAP@1000 |
MRR@1 |
MRR@10 |
MRR@100 |
MRR@1000 |
NDCG@1 |
NDCG@10 |
NDCG@100 |
NDCG@1000 |
Precision@1 |
Precision@10 |
Precision@100 |
Precision@1000 |
Recall@1 |
Recall@10 |
Recall@100 |
Recall@1000 |
MTEB ArguAna |
31.223 |
47.947 |
48.742000000000004 |
48.745 |
32.432 |
48.4 |
49.202 |
49.205 |
31.223 |
57.045 |
60.175 |
60.233000000000004 |
31.223 |
8.599 |
0.991 |
0.1 |
31.223 |
85.989 |
99.075 |
99.502 |
MTEB CQADupstackAndroidRetrieval |
35.803000000000004 |
48.848 |
50.5 |
50.602999999999994 |
44.635000000000005 |
55.593 |
56.169999999999995 |
56.19499999999999 |
44.635000000000005 |
55.899 |
60.958 |
62.302 |
44.635000000000005 |
10.786999999999999 |
1.6580000000000001 |
0.213 |
35.803000000000004 |
68.657 |
89.77199999999999 |
97.67 |
MTEB CQADupstackEnglishRetrieval |
33.706 |
44.896 |
46.299 |
46.44 |
41.592 |
50.529 |
51.22 |
51.258 |
41.592 |
50.77199999999999 |
55.383 |
57.288 |
41.592 |
9.516 |
1.541 |
0.2 |
33.706 |
61.353 |
80.182 |
91.896 |
MTEB CQADupstackGamingRetrieval |
44.424 |
57.169000000000004 |
58.202 |
58.242000000000004 |
50.470000000000006 |
60.489000000000004 |
61.096 |
61.112 |
50.470000000000006 |
63.071999999999996 |
66.964 |
67.659 |
50.470000000000006 |
10.019 |
1.29 |
0.13899999999999998 |
44.424 |
77.02 |
93.738 |
98.451 |
MTEB CQADupstackGisRetrieval |
26.294 |
34.503 |
35.641 |
35.724000000000004 |
28.362 |
36.53 |
37.541000000000004 |
37.602000000000004 |
28.362 |
39.513999999999996 |
44.815 |
46.839 |
28.362 |
6.101999999999999 |
0.9129999999999999 |
0.11399999999999999 |
26.294 |
53.098 |
76.877 |
91.834 |
MTEB CQADupstackMathematicaRetrieval |
16.407 |
25.185999999999996 |
26.533 |
26.657999999999998 |
20.522000000000002 |
29.522 |
30.644 |
30.713 |
20.522000000000002 |
30.656 |
36.864999999999995 |
39.675 |
20.522000000000002 |
5.795999999999999 |
1.027 |
- |
16.407 |
- |
- |
- |
Clustering Tasks
Dataset |
V - Measure |
MTEB ArxivClusteringP2P |
43.1621946393635 |
MTEB ArxivClusteringS2S |
32.56417132407894 |
MTEB BiorxivClusteringP2P |
37.61848554098577 |
MTEB BiorxivClusteringS2S |
31.32623280148178 |
Reranking Task
Dataset |
MAP |
MRR |
MTEB AskUbuntuDupQuestions |
64.29539304390207 |
76.44484017060196 |
STS Task
Dataset |
Cosine Similarity Spearman |
MTEB BIOSSES |
84.38746499431112 |