🚀 tao-8k-origin
This model is designed for sentence similarity tasks, with excellent performance in multiple datasets and tasks such as STS, Classification, Clustering, Reranking, and Retrieval.
📚 Documentation
Model Information
Property |
Details |
Pipeline Tag |
sentence-similarity |
Tags |
sentence-transformers, feature-extraction, sentence-similarity, mteb |
Model Performance
Model Name: tao-8k-origin
The model has been tested on various datasets and tasks, with the following performance metrics:
1. STS Task
-
Dataset: MTEB AFQMC (C-MTEB/AFQMC)
- Split: validation
| Metric Type | Value |
|-------------|-------|
| cos_sim_pearson | 47.33644889578121 |
| cos_sim_spearman | 49.93968642502866 |
| euclidean_pearson | 48.12029792973887 |
| euclidean_spearman | 49.939666315145494 |
| manhattan_pearson | 48.07449594650583 |
| manhattan_spearman | 49.892461433911166 |
-
Dataset: MTEB ATEC (C-MTEB/ATEC)
- Split: test
| Metric Type | Value |
|-------------|-------|
| cos_sim_pearson | 50.976148098905746 |
| cos_sim_spearman | 53.11230114448237 |
| euclidean_pearson | 55.119977161851054 |
| euclidean_spearman | 53.11229776647941 |
| manhattan_pearson | 55.096968162828034 |
| manhattan_spearman | 53.107481302419465 |
-
Dataset: MTEB BQ (C-MTEB/BQ)
- Split: test
| Metric Type | Value |
|-------------|-------|
| cos_sim_pearson | 62.843816050026824 |
| cos_sim_spearman | 65.54142642656706 |
| euclidean_pearson | 64.08809634876388 |
| euclidean_spearman | 65.54142642558392 |
| manhattan_pearson | 64.09391522108272 |
| manhattan_spearman | 65.55445491162718 |
-
Dataset: MTEB LCQMC (C-MTEB/LCQMC)
- Split: test
| Metric Type | Value |
|-------------|-------|
| cos_sim_pearson | 71.17874301231225 |
| cos_sim_spearman | 77.47936067899236 |
| euclidean_pearson | 76.3241109984839 |
| euclidean_spearman | 77.47936511149533 |
| manhattan_pearson | 76.3334642249198 |
| manhattan_spearman | 77.48889610190774 |
2. Classification Task
-
Dataset: MTEB AmazonReviewsClassification (zh) (mteb/amazon_reviews_multi)
- Split: test
| Metric Type | Value |
|-------------|-------|
| accuracy | 40.804 |
| f1 | 39.01066543513968 |
-
Dataset: MTEB IFlyTek (C-MTEB/IFlyTek-classification)
- Split: validation
| Metric Type | Value |
|-------------|-------|
| accuracy | 47.98768757214313 |
| f1 | 35.24243089488371 |
-
Dataset: MTEB JDReview (C-MTEB/JDReview-classification)
- Split: test
| Metric Type | Value |
|-------------|-------|
| accuracy | 87.01688555347093 |
| ap | 56.39167630414159 |
| f1 | 81.91756262306008 |
3. Clustering Task
4. Reranking Task
-
Dataset: MTEB CMedQAv1 (C-MTEB/CMedQAv1-reranking)
- Split: test
| Metric Type | Value |
|-------------|-------|
| map | 85.34294439514511 |
| mrr | 88.03849206349206 |
-
Dataset: MTEB CMedQAv2 (C-MTEB/CMedQAv2-reranking)
- Split: test
| Metric Type | Value |
|-------------|-------|
| map | 85.81294364673899 |
| mrr | 88.52146825396825 |
-
Dataset: MTEB MMarcoReranking (C-MTEB/Mmarco-reranking)
- Split: dev
| Metric Type | Value |
|-------------|-------|
| map | 27.96872431410137 |
| mrr | 26.92023809523809 |
5. Retrieval Task
-
Dataset: MTEB CmedqaRetrieval (C-MTEB/CmedqaRetrieval)
- Split: dev
| Metric Type | Value |
|-------------|-------|
| map_at_1 | 23.982 |
| map_at_10 | 36.21 |
| map_at_100 | 38.072 |
| map_at_1000 | 38.194 |
| map_at_3 | 32.239000000000004 |
| map_at_5 | 34.377 |
| mrr_at_1 | 36.858999999999995 |
| mrr_at_10 | 45.084999999999994 |
| mrr_at_100 | 46.104 |
| mrr_at_1000 | 46.154 |
| mrr_at_3 | 42.623 |
| mrr_at_5 | 43.995 |
| ndcg_at_1 | 36.858999999999995 |
| ndcg_at_10 | 42.735 |
| ndcg_at_100 | 50.181 |
| ndcg_at_1000 | 52.309000000000005 |
| ndcg_at_3 | 37.728 |
| ndcg_at_5 | 39.664 |
| precision_at_1 | 36.858999999999995 |
| precision_at_10 | 9.615 |
| precision_at_100 | 1.564 |
| precision_at_1000 | 0.183 |
| precision_at_3 | 21.514 |
| precision_at_5 | 15.568999999999999 |
| recall_at_1 | 23.982 |
| recall_at_10 | 53.04600000000001 |
| recall_at_100 | 84.113 |
| recall_at_1000 | 98.37 |
| recall_at_3 | 37.824999999999996 |
| recall_at_5 | 44.023 |
-
Dataset: MTEB CovidRetrieval (C-MTEB/CovidRetrieval)
- Split: dev
| Metric Type | Value |
|-------------|-------|
| map_at_1 | 70.706 |
| map_at_10 | 78.619 |
| map_at_100 | 78.915 |
| map_at_1000 | 78.918 |
| map_at_3 | 76.967 |
| map_at_5 | 77.922 |
| mrr_at_1 | 70.917 |
| mrr_at_10 | 78.64 |
| mrr_at_100 | 78.935 |
| mrr_at_1000 | 78.938 |
| mrr_at_3 | 77.081 |
| mrr_at_5 | 77.972 |
| ndcg_at_1 | 70.917 |
| ndcg_at_10 | 82.186 |
| ndcg_at_100 | 83.487 |
| ndcg_at_1000 | 83.589 |
| ndcg_at_3 | 78.874 |
| ndcg_at_5 | 80.548 |
| precision_at_1 | 70.917 |
| precision_at_10 | 9.431000000000001 |
| precision_at_100 | 1.001 |
| precision_at_1000 | 0.101 |
| precision_at_3 | 28.275 |
| precision_at_5 | 17.829 |
| recall_at_1 | 70.706 |
| recall_at_10 | 93.256 |
| recall_at_100 | 99.05199999999999 |
| recall_at_1000 | 99.895 |
| recall_at_3 | 84.247 |
| recall_at_5 | 88.251 |
-
Dataset: MTEB DuRetrieval (C-MTEB/DuRetrieval)
- Split: dev
| Metric Type | Value |
|-------------|-------|
| map_at_1 | 25.989 |
| map_at_10 | 80.882 |
| map_at_100 | 83.63199999999999 |
| map_at_1000 | 83.663 |
| map_at_3 | 55.772 |
| map_at_5 | 70.598 |
| mrr_at_1 | 90.14999999999999 |
| mrr_at_10 | 93.30000000000001 |
| mrr_at_100 | 93.363 |
| mrr_at_1000 | 93.366 |
| mrr_at_3 | 93.083 |
| mrr_at_5 | 93.206 |
| ndcg_at_1 | 90.14999999999999 |
| ndcg_at_10 | 88.016 |
| ndcg_at_100 | 90.52900000000001 |
| ndcg_at_1000 | 90.84400000000001 |
| ndcg_at_3 | 86.529 |
| ndcg_at_5 | 85.65899999999999 |
| precision_at_1 | 90.14999999999999 |
| precision_at_10 | 42.295 |
| precision_at_100 | 4.826 |
| precision_at_1000 | 0.48900000000000005 |
| precision_at_3 | 77.717 |
| precision_at_5 | 65.81 |
| recall_at_1 | 25.989 |
| recall_at_10 | 89.446 |
| recall_at_100 | 97.832 |
| recall_at_1000 | 99.568 |
| recall_at_3 | 58.223 |
| recall_at_5 | 75.411 |
-
Dataset: MTEB EcomRetrieval (C-MTEB/EcomRetrieval)
- Split: dev
| Metric Type | Value |
|-------------|-------|
| map_at_1 | 49.6 |
| map_at_10 | 59.512 |
| map_at_100 | 60.059 |
| map_at_1000 | 60.077999999999996 |
| map_at_3 | 56.882999999999996 |
| map_at_5 | 58.298 |
| mrr_at_1 | 49.6 |
| mrr_at_10 | 59.512 |
| mrr_at_100 | 60.059 |
| mrr_at_1000 | 60.077999999999996 |
| mrr_at_3 | 56.882999999999996 |
| mrr_at_5 | 58.298 |
| ndcg_at_1 | 49.6 |
| ndcg_at_10 | 64.71000000000001 |
| ndcg_at_100 | 67.238 |
| ndcg_at_1000 | 67.74 |
| ndcg_at_3 | 59.275 |
| ndcg_at_5 | 61.805 |
| precision_at_1 | 49.6 |
| precision_at_10 | 8.12 |
| precision_at_100 | 0.927 |
| precision_at_1000 | 0.097 |
| precision_at_3 | 22.067 |
| precision_at_5 | 14.46 |
| recall_at_1 | 49.6 |
| recall_at_10 | 81.2 |
| recall_at_100 | 92.7 |
| recall_at_1000 | 99.6 |
| recall_at_3 | 66.2 |
| recall_at_5 | 72.3 |
6. PairClassification Task
- Dataset: MTEB Cmnli (C-MTEB/CMNLI)
- Split: validation
| Metric Type | Value |
|-------------|-------|
| cos_sim_accuracy | 77.71497294046902 |
| cos_sim_ap | 86.84526989595028 |
| cos_sim_f1 | 79.31987247608926 |
| cos_sim_precision | 72.70601987142022 |
| cos_sim_recall | 87.2574234276362 |
| dot_accuracy | 77.71497294046902 |
| dot_ap | 86.83880734247957 |
| dot_f1 | 79.31987247608926 |
| dot_precision | 72.70601987142022 |
| dot_recall | 87.2574234276362 |
| euclidean_accuracy | 77.71497294046902 |
| euclidean_ap | 86.84526869685902 |
| euclidean_f1 | 79.31987247608926 |
| euclidean_precision | 72.70601987142022 |
| euclidean_recall | 87.2574234276362 |
| manhattan_accuracy | 77.8111846061335 |
| manhattan_ap | 86.81142881585656 |
| manhattan_f1 | 79.4201671780764 |
| manhattan_precision | 72.53575570158485 |
| manhattan_recall | 87.74842179097499 |
| max_accuracy | 77.8111846061335 |
| max_ap | 86.84526989595028 |
| max_f1 | 79.4201671780764 |