🚀 gte-large-zh
This is a model based on thenlper/gte-large-zh
, which has been tested on multiple tasks and datasets in the MTEB framework, demonstrating its performance in various natural language processing tasks such as sentence similarity, classification, clustering, reranking, and retrieval.
📚 Documentation
Model Information
Property |
Details |
Base Model |
thenlper/gte-large-zh |
License |
MIT |
Tags |
mteb, sentence-similarity, sentence-transformers, Sentence Transformers, llama-cpp, gguf-my-repo |
Model Performance Results
1. STS Task Results
Dataset |
cos_sim_pearson |
cos_sim_spearman |
euclidean_pearson |
euclidean_spearman |
manhattan_pearson |
manhattan_spearman |
MTEB AFQMC (validation) |
48.94131905219026 |
54.58261199731436 |
52.73929210805982 |
54.582632097533676 |
52.73123295724949 |
54.572941830465794 |
MTEB ATEC (test) |
47.292931669579005 |
54.601019783506466 |
54.61393532658173 |
54.60101865708542 |
54.59369555606305 |
54.601098593646036 |
MTEB BQ (test) |
62.55033151404683 |
64.40573802644984 |
62.93453281081951 |
64.40574149035828 |
62.839969210895816 |
64.30837945045283 |
MTEB LCQMC (test) |
69.50947272908907 |
74.40054474949213 |
73.53007373987617 |
74.40054474732082 |
73.51396571849736 |
74.38395696630835 |
MTEB PAWSX (test) |
35.301730226895955 |
38.54612530948101 |
39.02831131230217 |
38.54612530948101 |
39.04765584936325 |
38.54455759013173 |
MTEB QBQTC (test) |
32.27907454729754 |
33.35945567162729 |
31.997628193815725 |
33.3592386340529 |
31.97117833750544 |
33.30857326127779 |
MTEB STS22 (zh) (test) |
62.53712784446981 |
62.975074386224286 |
61.791207731290854 |
62.975073716988064 |
- |
- |
2. Classification Task Results
Dataset |
Accuracy |
F1 |
AP |
MTEB AmazonReviewsClassification (zh) (test) |
47.233999999999995 |
45.68998446563349 |
- |
MTEB IFlyTek (validation) |
49.60369372835706 |
38.24016248875209 |
- |
MTEB JDReview (test) |
86.71669793621012 |
81.59033162805417 |
55.75807094995178 |
MTEB MassiveIntentClassification (zh - CN) (test) |
73.30531271015468 |
70.88091430578575 |
- |
MTEB MassiveScenarioClassification (zh - CN) (test) |
75.7128446536651 |
75.06125593532262 |
- |
MTEB MultilingualSentiment (validation) |
76.47666666666667 |
76.4808576632057 |
- |
MTEB OnlineShopping (test) |
92.68 |
92.67153098230253 |
90.78652757815115 |
3. Clustering Task Results
Dataset |
v_measure |
MTEB CLSClusteringP2P (test) |
42.098169316685045 |
MTEB CLSClusteringS2S (test) |
38.90716707051822 |
4. Reranking Task Results
Dataset |
MAP |
MRR |
MTEB CMedQAv1 (test) |
86.09191911031553 |
88.6747619047619 |
MTEB CMedQAv2 (test) |
86.45781885502122 |
89.01591269841269 |
MTEB MMarcoReranking (dev) |
31.188333827724108 |
29.84801587301587 |
5. Retrieval Task Results
| Dataset | map_at_1 | map_at_10 | map_at_100 | map_at_1000 | map_at_3 | map_at_5 | mrr_at_1 | mrr_at_10 | mrr_at_100 | mrr_at_1000 | mrr_at_3 | mrr_at_5 | ndcg_at_1 | ndcg_at_10 | ndcg_at_100 | ndcg_at_1000 | ndcg_at_3 | ndcg_at_5 | precision_at_1 | precision_at_10 | precision_at_100 | precision_at_1000 | precision_at_3 | precision_at_5 | recall_at_1 | recall_at_10 | recall_at_100 | recall_at_1000 | recall_at_3 | recall_at_5 |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| MTEB CmedqaRetrieval (dev) | 24.215 | 36.498000000000005 | 38.409 | 38.524 | 32.428000000000004 | 34.664 | 36.834 | 45.196 | 46.214 | 46.259 | 42.631 | 44.044 | 36.834 | 43.146 | 50.632999999999996 | 52.608999999999995 | 37.851 | 40.005 | 36.834 | 9.647 | 1.574 | 0.183 | 21.48 | 15.649 | 24.215 | 54.079 | 84.943 | 98.098 | 38.117000000000004 | 44.775999999999996 |
| MTEB CovidRetrieval (dev) | 78.583 | 85.613 | 85.777 | 85.77900000000001 | 84.58 | 85.22800000000001 | 78.925 | 85.667 | 85.822 | 85.824 | 84.651 | 85.299 | 78.925 | 88.405 | 89.02799999999999 | 89.093 | 86.393 | 87.5 | 78.925 | 9.789 | 1.005 | 0.101 | 30.769000000000002 | 19.031000000000002 | 78.583 | 96.891 | 99.473 | 100.0 | 91.438 | 94.152 |
| MTEB DuRetrieval (dev) | 25.604 | 77.171 | 80.033 | 80.099 | 54.364000000000004 | 68.024 | 89.85 | 93.009 | 93.065 | 93.068 | 92.72500000000001 | 92.915 | 89.85 | 85.038 | 88.247 | 88.837 | 85.20299999999999 | 83.47 | 89.85 | 40.275 | 4.709 | 0.486 | 76.36699999999999 | 63.75999999999999 | 25.604 | 85.423 | 95.695 | 98.669 | 56.737 | 72.646 |
| MTEB EcomRetrieval (dev) | 51.800000000000004 | 62.17 | 62.649 | 62.663000000000004 | 59.699999999999996 | 61.23499999999999 | 51.800000000000004 | 62.17 | 62.649 | 62.663000000000004 | 59.699999999999996 | 61.23499999999999 | 51.800000000000004 | 67.246 | 69.58 | 69.925 | 62.197 | 64.981 | 51.800000000000004 | 8.32 | 0.941 | 0.097 | 23.133 | 15.24 | 51.800000000000004 | 83.2 | 94.1 | 96.8 | 69.39999999999999 | 76.2 |
| MTEB MMarcoRetrieval (dev) | 64.685 | 73.803 | 74.153 | 74.167 | 71.98 | 73.21600000000001 | 66.891 | 74.48700000000001 | 74.788 | 74.801 | 72.918 | 73.965 | 66.891 | 77.534 | 79.106 | 79.494 | 74.13499999999999 | 76.20700000000001 | 66.891 | 9.375 | 1.0170000000000001 | 0.105 | 27.932000000000002 | 17.86 | 64.685 | 88.298 | 95.426 | 98.48700000000001 | 79.44200000000001 | 84.358 |
| MTEB MedicalRetrieval (dev) | 52.7 | 59.532 | 60.085 | 60.126000000000005 | 57.767 | 58.952000000000005 | 52.900000000000006 | 59.648999999999994 | 60.20100000000001 | 60.242 | 57.882999999999996 | 59.068 | 52.7 | 62.883 | 65.714 | 66.932 | 59.34700000000001 | 61.486 | 52.7 | 7.340000000000001 | 0.8699999999999999 | 0.097 | 21.3 | 13.819999999999999 | 52.7 | 73.4 | 87.0 | 96.8 | 63.9 | 69.1 |
5. Pair - Classification Task Results
Dataset |
cos_sim_accuracy |
cos_sim_ap |
cos_sim_f1 |
cos_sim_precision |
cos_sim_recall |
dot_accuracy |
dot_ap |
dot_f1 |
dot_precision |
dot_recall |
euclidean_accuracy |
euclidean_ap |
euclidean_f1 |
euclidean_precision |
euclidean_recall |
manhattan_accuracy |
manhattan_ap |
manhattan_f1 |
manhattan_precision |
manhattan_recall |
max_accuracy |
max_ap |
max_f1 |
MTEB Cmnli (validation) |
82.51352976548407 |
89.49905141462749 |
83.89334489486234 |
78.19761567993534 |
90.48398410100538 |
82.51352976548407 |
89.49108293121158 |
83.89334489486234 |
78.19761567993534 |
90.48398410100538 |
82.51352976548407 |
89.49904709975154 |
83.89334489486234 |
78.19761567993534 |
90.48398410100538 |
82.48947684906794 |
89.49231995962901 |
83.84681215233205 |
77.28258726089528 |
91.62964694879588 |
82.51352976548407 |
89.49905141462749 |
83.89334489486234 |
MTEB Ocnli (validation) |
77.58527341635084 |
79.32131557636497 |
80.51948051948052 |
71.7948717948718 |
91.65786694825766 |
77.58527341635084 |
79.32131557636497 |
80.51948051948052 |
71.7948717948718 |
91.65786694825766 |
77.58527341635084 |
79.32131557636497 |
80.51948051948052 |
71.7948717948718 |
91.65786694825766 |
77.15213860314023 |
79.26178519246496 |
80.22028453418999 |
70.94155844155844 |
92.29144667370645 |
77.58527341635084 |
79.32131557636497 |
80.51948051948052 |
📄 License
This project is licensed under the MIT license.