🚀 GIST-small-Embedding-v0
GIST-small-Embedding-v0 is a model in the sentence-transformers library, which is mainly used for sentence similarity tasks. It has been tested on multiple datasets and shows good performance in various tasks such as classification, retrieval, clustering, and reranking.
📚 Documentation
Model Information
Property |
Details |
Library Name |
sentence-transformers |
Pipeline Tag |
sentence-similarity |
Tags |
feature-extraction, mteb, sentence-similarity, sentence-transformers |
Model Name |
GIST-small-Embedding-v0 |
License |
MIT |
Performance Results
The following are the performance results of the GIST-small-Embedding-v0 model on different tasks and datasets:
Classification Tasks
Task |
Dataset |
Accuracy |
AP |
F1 |
Classification |
MTEB AmazonCounterfactualClassification (en) |
75.26865671641791 |
38.25623793370476 |
69.26434651320257 |
Classification |
MTEB AmazonPolarityClassification |
93.232225 |
89.97936072879344 |
93.22122653806187 |
Classification |
MTEB AmazonReviewsClassification (en) |
49.715999999999994 |
- |
49.169789920136076 |
Classification |
MTEB Banking77Classification |
86.64935064935065 |
- |
86.61903824934998 |
Retrieval Tasks
Task |
Dataset |
MAP@1 |
MAP@10 |
MAP@100 |
MAP@1000 |
MRR@1 |
MRR@10 |
MRR@100 |
MRR@1000 |
NDCG@1 |
NDCG@10 |
NDCG@100 |
NDCG@1000 |
Precision@1 |
Precision@10 |
Precision@100 |
Precision@1000 |
Recall@1 |
Recall@10 |
Recall@100 |
Recall@1000 |
Retrieval |
MTEB ArguAna |
34.922 |
50.524 |
51.247 |
51.249 |
34.922 |
50.382000000000005 |
51.104000000000006 |
51.105999999999995 |
34.922 |
59.12 |
62.083999999999996 |
62.137 |
34.922 |
8.649 |
0.991 |
0.1 |
34.922 |
86.48599999999999 |
99.14699999999999 |
99.57300000000001 |
Retrieval |
MTEB CQADupstackAndroidRetrieval |
31.843 |
43.379 |
44.946999999999996 |
45.078 |
39.199 |
49.672 |
50.321000000000005 |
50.365 |
39.199 |
50.163999999999994 |
55.418 |
57.353 |
39.199 |
9.757 |
1.552 |
0.20500000000000002 |
31.843 |
62.743 |
84.78099999999999 |
96.86099999999999 |
Retrieval |
MTEB CQADupstackEnglishRetrieval |
29.321 |
39.062999999999995 |
40.403 |
40.534 |
35.987 |
44.708999999999996 |
45.394 |
45.436 |
35.987 |
44.585 |
49.297999999999995 |
51.315 |
35.987 |
8.369 |
1.366 |
0.184 |
29.321 |
54.333 |
74.178 |
86.732 |
Retrieval |
MTEB CQADupstackGamingRetrieval |
38.811 |
51.114000000000004 |
52.22 |
52.275000000000006 |
44.389 |
54.459 |
55.208999999999996 |
55.239000000000004 |
44.389 |
56.979 |
61.266 |
62.315 |
44.389 |
9.26 |
1.226 |
0.136 |
38.811 |
70.841 |
89.218 |
96.482 |
Retrieval |
MTEB CQADupstackGisRetrieval |
25.378 |
34.311 |
35.399 |
35.482 |
27.683999999999997 |
36.575 |
37.492 |
37.556 |
27.683999999999997 |
39.247 |
44.424 |
46.478 |
27.683999999999997 |
5.989 |
0.899 |
0.11199999999999999 |
25.378 |
52.195 |
75.764 |
91.012 |
Retrieval |
MTEB CQADupstackMathematicaRetrieval |
17.326 |
25.247000000000003 |
26.473000000000003 |
26.579000000000004 |
21.393 |
30.187 |
31.089 |
31.15 |
21.393 |
30.668 |
36.543 |
39.181 |
21.393 |
5.784000000000001 |
1.001 |
0.136 |
17.326 |
38.811 |
52.195 |
75.764 |
Clustering Tasks
Task |
Dataset |
V-Measure |
Clustering |
MTEB ArxivClusteringP2P |
47.623282347623714 |
Clustering |
MTEB ArxivClusteringS2S |
39.86487843524932 |
Clustering |
MTEB BiorxivClusteringP2P |
39.21904455377494 |
Clustering |
MTEB BiorxivClusteringS2S |
35.43342755570654 |
Reranking Task
Task |
Dataset |
MAP |
MRR |
Reranking |
MTEB AskUbuntuDupQuestions |
62.3290291318171 |
75.2379853141626 |
STS Task
Task |
Dataset |
Cosine Similarity (Pearson) |
Cosine Similarity (Spearman) |
Euclidean (Pearson) |
Euclidean (Spearman) |
Manhattan (Pearson) |
Manhattan (Spearman) |
STS |
MTEB BIOSSES |
88.52002953574285 |
86.98752423842483 |
86.89442688314197 |
86.88631711307471 |
87.03723618507175 |
86.76041062975224 |
📄 License
This project is licensed under the MIT License.