🚀 GIST-Embedding-v0
This is a model based on the sentence-transformers
library, focusing on sentence similarity tasks. It has been tested on multiple datasets in the MTEB benchmark, demonstrating its performance in various natural language processing tasks such as classification, retrieval, clustering, reranking, and semantic text similarity.
📚 Documentation
Model Information
Property |
Details |
Library Name |
sentence-transformers |
Pipeline Tag |
sentence-similarity |
Tags |
feature-extraction, mteb, sentence-similarity, sentence-transformers |
Model Name |
GIST-Embedding-v0 |
License |
MIT |
Performance Metrics
The model has been evaluated on multiple tasks and datasets, and the following are the detailed performance metrics:
Classification Tasks
Dataset |
Accuracy |
AP |
F1 |
MTEB AmazonCounterfactualClassification (en) |
75.95522388059702 |
38.940434354439276 |
69.88686275888114 |
MTEB AmazonPolarityClassification |
93.51357499999999 |
90.30414241486682 |
93.50552829047328 |
MTEB AmazonReviewsClassification (en) |
50.446000000000005 |
- |
49.76432659699279 |
MTEB Banking77Classification |
87.29545454545453 |
- |
87.26415991342238 |
Retrieval Tasks
Dataset |
MAP@1 |
MAP@10 |
MAP@100 |
MAP@1000 |
MRR@1 |
MRR@10 |
MRR@100 |
MRR@1000 |
NDCG@1 |
NDCG@10 |
NDCG@100 |
NDCG@1000 |
Precision@1 |
Precision@10 |
Precision@100 |
Precision@1000 |
Recall@1 |
Recall@10 |
Recall@100 |
Recall@1000 |
MTEB ArguAna |
38.265 |
54.236 |
54.81399999999999 |
54.81700000000001 |
38.265 |
54.152 |
54.730000000000004 |
54.733 |
38.265 |
62.62 |
64.96600000000001 |
65.035 |
38.265 |
8.919 |
0.991 |
0.1 |
38.265 |
89.189 |
99.14699999999999 |
99.644 |
MTEB CQADupstackAndroidRetrieval |
33.979 |
46.275 |
47.975 |
48.089 |
42.346000000000004 |
53.013 |
53.717000000000006 |
53.749 |
42.346000000000004 |
53.179 |
58.458 |
60.057 |
42.346000000000004 |
10.386 |
1.635 |
0.20600000000000002 |
33.979 |
65.553 |
87.18599999999999 |
97.25200000000001 |
MTEB CQADupstackEnglishRetrieval |
31.529 |
42.219 |
43.408 |
43.544 |
39.873 |
48.25 |
48.867 |
48.908 |
39.873 |
47.933 |
52.156000000000006 |
54.238 |
39.873 |
9.032 |
1.419 |
0.192 |
31.529 |
57.925000000000004 |
75.89 |
89.007 |
MTEB CQADupstackGamingRetrieval |
41.289 |
54.494 |
55.494 |
55.545 |
47.335 |
57.772 |
58.428000000000004 |
58.453 |
47.335 |
60.382999999999996 |
64.294 |
65.211 |
47.335 |
9.724 |
1.26 |
0.13699999999999998 |
41.289 |
74.36399999999999 |
91.19800000000001 |
97.508 |
MTEB CQADupstackGisRetrieval |
28.816999999999997 |
37.856 |
38.824 |
38.902 |
31.073 |
39.985 |
40.802 |
40.861999999999995 |
31.073 |
42.958 |
47.671 |
49.633 |
31.073 |
6.531000000000001 |
0.932 |
0.11399999999999999 |
28.816999999999997 |
56.538999999999994 |
78.17699999999999 |
92.92200000000001 |
MTEB CQADupstackMathematicaRetrieval |
18.397 |
27.256999999999998 |
28.541 |
28.658 |
22.761 |
32.248 |
33.171 |
33.227000000000004 |
22.761 |
32.879999999999995 |
38.913 |
41.504999999999995 |
22.761 |
6.045 |
1.044 |
0.13999999999999999 |
18.397 |
45.14 |
71.758 |
89.854 |
MTEB CQADupstackPhysicsRetrieval |
30.604 |
42.132 |
43.419000000000004 |
43.527 |
37.824999999999996 |
47.696 |
48.483 |
48.53 |
37.824999999999996 |
48.421 |
53.568000000000005 |
55.574999999999996 |
37.824999999999996 |
8.758000000000001 |
1.319 |
0.168 |
30.604 |
61.605 |
82.787 |
95.78 |
MTEB CQADupstackProgrammersRetrieval |
26.262999999999998 |
36.858999999999995 |
38.241 |
38.346999999999994 |
32.42 |
42.361 |
43.219 |
43.271 |
32.42 |
43.081 |
48.837 |
50.954 |
32.42 |
8.071 |
1.272 |
0.163 |
26.262999999999998 |
56.062999999999995 |
80.636 |
94.707 |
MTEB CQADupstackRetrieval |
27.86616666666667 |
37.584999999999994 |
38.80291666666667 |
38.91358333333333 |
33.07566666666667 |
41.92366666666666 |
42.73516666666667 |
42.785666666666664 |
33.07566666666667 |
43.19875 |
48.32083333333334 |
50.418000000000006 |
33.07566666666667 |
7.581916666666666 |
1.1975 |
0.15699999999999997 |
27.86616666666667 |
55.449749999999995 |
77.92516666666666 |
92.31358333333333 |
Clustering Tasks
Dataset |
V-Measure |
MTEB ArxivClusteringP2P |
48.287087887491744 |
MTEB ArxivClusteringS2S |
42.74244928943812 |
MTEB BiorxivClusteringP2P |
39.035319537839484 |
MTEB BiorxivClusteringS2S |
36.667313307057285 |
Reranking Tasks
Dataset |
MAP |
MRR |
MTEB AskUbuntuDupQuestions |
62.68814324295771 |
75.46266983247591 |
STS Tasks
Dataset |
Cosine Similarity Pearson |
Cosine Similarity Spearman |
Euclidean Pearson |
Euclidean Spearman |
Manhattan Pearson |
Manhattan Spearman |
MTEB BIOSSES |
90.45240209600391 |
87.95079919934645 |
88.93438602492702 |
88.28152962682988 |
88.92193964325268 |
88.21466063329498 |
Bitext Mining Tasks
Dataset |
Accuracy |
F1 |
Precision |
Recall |
MTEB BUCC (de-en) |
15.605427974947808 |
14.989877233698866 |
14.77906814441261 |
15.605427974947808 |
MTEB BUCC (fr-en) |
33.38102575390711 |
32.41704114719127 |
32.057363829835964 |
33.38102575390711 |
MTEB BUCC (ru-en) |
0.1939729823346034 |
0.17832215223820772 |
0.17639155671715423 |
0.1939729823346034 |
MTEB BUCC (zh-en) |
3.0542390731964195 |
2.762857644374232 |
2.6505178163945935 |
3.0542390731964195 |
📄 License
This project is licensed under the MIT License.