🚀 sbert-chinese-general-v1
This is a model for sentence similarity tasks, which can be used for feature extraction, semantic search, etc., especially suitable for Chinese text processing.
📚 Documentation
Model Information
Property |
Details |
Pipeline Tag |
sentence-similarity |
Tags |
sentence-transformers, feature-extraction, sentence-similarity, transformers, semantic-search, chinese, mteb |
Model Name |
sbert-chinese-general-v1 |
Results
The model has been tested on multiple datasets and tasks, and the following are the detailed results:
STS Task
-
Dataset: C-MTEB/AFQMC (Validation Split)
- cos_sim_pearson: 22.293919432958074
- cos_sim_spearman: 22.56718923553609
- euclidean_pearson: 22.525656322797026
- euclidean_spearman: 22.56718923553609
- manhattan_pearson: 22.501773028824065
- manhattan_spearman: 22.536992587828397
-
Dataset: C-MTEB/ATEC (Test Split)
- cos_sim_pearson: 30.33575274463879
- cos_sim_spearman: 30.298708742167772
- euclidean_pearson: 32.33094743729218
- euclidean_spearman: 30.298710993858734
- manhattan_pearson: 32.31155376195945
- manhattan_spearman: 30.267669681690744
-
Dataset: C-MTEB/BQ (Test Split)
- cos_sim_pearson: 41.493256724214255
- cos_sim_spearman: 40.98395961967895
- euclidean_pearson: 41.12345737966565
- euclidean_spearman: 40.983959619555996
- manhattan_pearson: 41.02584539471014
- manhattan_spearman: 40.87549513383032
-
Dataset: C-MTEB/LCQMC (Test Split)
- cos_sim_pearson: 62.84331627244547
- cos_sim_spearman: 68.39990265073726
- euclidean_pearson: 66.87431827169324
- euclidean_spearman: 68.39990264979167
- manhattan_pearson: 66.89702078900328
- manhattan_spearman: 68.42107302159141
Classification Task
-
Dataset: mteb/amazon_reviews_multi (Test Split, zh Config)
- accuracy: 37.507999999999996
- f1: 36.436808400753286
-
Dataset: C-MTEB/IFlyTek-classification (Validation Split)
- accuracy: 44.878799538283964
- f1: 33.84678310261366
-
Dataset: C-MTEB/JDReview-classification (Test Split)
- accuracy: 82.1951219512195
- ap: 46.78292030042397
- f1: 76.20482468514128
BitextMining Task
- Dataset: mteb/bucc-bitext-mining (Test Split, zh-en Config)
- accuracy: 9.794628751974724
- f1: 9.350535369492716
- precision: 9.179392662804986
- recall: 9.794628751974724
Clustering Task
Reranking Task
-
Dataset: C-MTEB/CMedQAv1-reranking (Test Split)
- map: 53.06586280826805
- mrr: 59.58781746031746
-
Dataset: C-MTEB/CMedQAv2-reranking (Test Split)
- map: 52.83635946154306
- mrr: 59.315079365079356
-
Dataset: C-MTEB/Mmarco-reranking (Dev Split)
- map: 9.28600891904827
- mrr: 8.057936507936509
Retrieval Task
-
Dataset: C-MTEB/CmedqaRetrieval (Dev Split)
- map_at_1: 5.721
- map_at_10: 8.645
- map_at_100: 9.434
- map_at_1000: 9.586
- map_at_3: 7.413
- map_at_5: 8.05
- mrr_at_1: 9.626999999999999
- mrr_at_10: 13.094
- mrr_at_100: 13.854
- mrr_at_1000: 13.958
- mrr_at_3: 11.724
- mrr_at_5: 12.409
- ndcg_at_1: 9.626999999999999
- ndcg_at_10: 11.35
- ndcg_at_100: 15.593000000000002
- ndcg_at_1000: 19.619
- ndcg_at_3: 9.317
- ndcg_at_5: 10.049
- precision_at_1: 9.626999999999999
- precision_at_10: 2.796
- precision_at_100: 0.629
- precision_at_1000: 0.11800000000000001
- precision_at_3: 5.476
- precision_at_5: 4.1209999999999996
- recall_at_1: 5.721
- recall_at_10: 15.190000000000001
- recall_at_100: 33.633
- recall_at_1000: 62.019999999999996
- recall_at_3: 9.099
- recall_at_5: 11.423
-
Dataset: C-MTEB/CovidRetrieval (Dev Split)
- map_at_1: 26.976
- map_at_10: 35.18
- map_at_100: 35.921
- map_at_1000: 35.998999999999995
- map_at_3: 32.763
- map_at_5: 34.165
- mrr_at_1: 26.976
- mrr_at_10: 35.234
- mrr_at_100: 35.939
- mrr_at_1000: 36.016
- mrr_at_3: 32.771
- mrr_at_5: 34.172999999999995
- ndcg_at_1: 26.976
- ndcg_at_10: 39.635
- ndcg_at_100: 43.54
- ndcg_at_1000: 45.723
- ndcg_at_3: 34.652
- ndcg_at_5: 37.186
- precision_at_1: 26.976
- precision_at_10: 5.406
- precision_at_100: 0.736
- precision_at_1000: 0.091
- precision_at_3: 13.418
- precision_at_5: 9.293999999999999
- recall_at_1: 26.976
- recall_at_10: 53.766999999999996
- recall_at_100: 72.761
- recall_at_1000: 90.148
- recall_at_3: 40.095
- recall_at_5: 46.233000000000004
-
Dataset: C-MTEB/DuRetrieval (Dev Split)
- map_at_1: 11.285
- map_at_10: 30.259000000000004
- map_at_100: 33.772000000000006
- map_at_1000: 34.037
- map_at_3: 21.038999999999998
- map_at_5: 25.939
- mrr_at_1: 45.1
- mrr_at_10: 55.803999999999995
- mrr_at_100: 56.301
- mrr_at_1000: 56.330999999999996
- mrr_at_3: 53.333
- mrr_at_5: 54.798
- ndcg_at_1: 45.1
- ndcg_at_10: 41.156
- ndcg_at_100: 49.518
- ndcg_at_1000: 52.947
- ndcg_at_3: 39.708
- ndcg_at_5: 38.704
- precision_at_1: 45.1
- precision_at_10: 20.75
- precision_at_100: 3.424
- precision_at_1000: 0.42700000000000005
- precision_at_3: 35.632999999999996
- precision_at_5: 30.080000000000002
- recall_at_1: 11.285
- recall_at_10: 43.242000000000004
- recall_at_100: 68.604
- recall_at_1000: 85.904
- recall_at_3: 24.404
- recall_at_5: 32.757
-
Dataset: C-MTEB/EcomRetrieval (Dev Split)
- map_at_1: 21
- map_at_10: 28.364
- map_at_100: 29.199
- map_at_1000: 29.265
- map_at_3: 25.717000000000002
- map_at_5: 27.311999999999998
- mrr_at_1: 21
- mrr_at_10: 28.364
- mrr_at_100: 29.199
- mrr_at_1000: 29.265
- mrr_at_3: 25.717000000000002
- mrr_at_5: 27.311999999999998
- ndcg_at_1: 21
- ndcg_at_10: 32.708
- ndcg_at_100: 37.184
- ndcg_at_1000: 39.273
- ndcg_at_3: 27.372000000000003
- ndcg_at_5: 30.23
- precision_at_1: 21
- precision_at_10: 4.66
- precision_at_100: 0.685
- precision_at_1000: 0.086
- precision_at_3: 10.732999999999999
- precision_at_5: 7.82
- recall_at_1: 21
- recall_at_10: 46.6
- recall_at_100: 68.5
- recall_at_1000: 85.6
- recall_at_3: 32.2
- recall_at_5: 39.1
PairClassification Task
- Dataset: C-MTEB/CMNLI (Validation Split)
- cos_sim_accuracy: 77.36620565243535
- cos_sim_ap: 85.92291866877001
- cos_sim_f1: 78.19390231037029
- cos_sim_precision: 71.24183006535948
- cos_sim_recall: 86.64952069207388
- dot_accuracy: 77.36620565243535
- dot_ap: 85.94113738490068
- dot_f1: 78.19390231037029
- dot_precision: 71.24183006535948
- dot_recall: 86.64952069207388
- euclidean_accuracy: 77.36620565243535
- euclidean_ap: 85.92291893444687
- euclidean_f1: 78.19390231037029
- euclidean_precision: 71.24183006535948
- euclidean_recall: 86.64952069207388
- manhattan_accuracy: 77.29404690318701
- manhattan_ap: 85.88284362100919
- manhattan_f1: 78.17836812144213
- manhattan_precision: 71.18448838548666
- manhattan_recall: 86.69628244096329
- max_accuracy: 77.36620565243535
- max_ap: 85.94113738490068
- max_f1: 78.19390231037029