🚀 jina-embeddings-v2-base-zh
This is a model related to sentence embeddings, which can be used for tasks such as feature extraction, sentence similarity calculation, etc. It has been tested on multiple datasets and shows certain performance in different tasks.
📚 Documentation
Model Information
Property |
Details |
Model Name |
jina-embeddings-v2-base-zh |
Tags |
sentence-transformers, feature-extraction, sentence-similarity, mteb, transformers, transformers.js |
Inference |
false |
License |
apache-2.0 |
Languages Supported |
en, zh |
Performance Results
The model has been evaluated on various tasks and datasets, here are the detailed results:
1. STS (Semantic Textual Similarity) Tasks
- C-MTEB/AFQMC (Validation Split)
- cos_sim_pearson: 48.51403119231363
- cos_sim_spearman: 50.5928547846445
- euclidean_pearson: 48.750436310559074
- euclidean_spearman: 50.50950238691385
- manhattan_pearson: 48.7866189440328
- manhattan_spearman: 50.58692402017165
- C-MTEB/ATEC (Test Split)
- cos_sim_pearson: 50.25985700105725
- cos_sim_spearman: 51.28815934593989
- euclidean_pearson: 52.70329248799904
- euclidean_spearman: 50.94101139559258
- manhattan_pearson: 52.6647237400892
- manhattan_spearman: 50.922441325406176
- C-MTEB/BQ (Test Split)
- cos_sim_pearson: 65.15667035488342
- cos_sim_spearman: 66.07110142081
- euclidean_pearson: 60.447598102249714
- euclidean_spearman: 61.826575796578766
- manhattan_pearson: 60.39364279354984
- manhattan_spearman: 61.78743491223281
- mteb/sts22-crosslingual-sts (Test Split, zh Config)
- cos_sim_pearson: 66.54931957553592
- cos_sim_spearman: 69.25068863016632
- euclidean_pearson: 50.26525596106869
- euclidean_spearman: 63.83352741910006
- manhattan_pearson: 49.98798282198196
2. Classification Tasks
- mteb/amazon_reviews_multi (Test Split, zh Config)
- accuracy: 34.944
- f1: 34.06478860660109
- C-MTEB/IFlyTek-classification (Validation Split)
- accuracy: 47.36437091188918
- f1: 36.60946954228577
- C-MTEB/JDReview-classification (Test Split)
- accuracy: 79.5684803001876
- ap: 42.671935929201524
- f1: 73.31912729103752
- mteb/amazon_massive_intent (Test Split, zh-CN Config)
- accuracy: 68.1977135171486
- f1: 67.23114308718404
- mteb/amazon_massive_scenario (Test Split, zh-CN Config)
- accuracy: 71.92669804976462
- f1: 72.90628475628779
- C-MTEB/MultilingualSentiment-classification (Validation Split)
- accuracy: 63.29333333333334
- f1: 63.03293854259612
- C-MTEB/OnlineShopping-classification (Test Split)
- accuracy: 87.00000000000001
- ap: 83.24372135949511
- f1: 86.95554191530607
3. Clustering Tasks
- C-MTEB/CLSClusteringP2P (Test Split)
- v_measure: 39.96714175391701
- C-MTEB/CLSClusteringS2S (Test Split)
- v_measure: 38.39863566717934
4. Reranking Tasks
- C-MTEB/CMedQAv1-reranking (Test Split)
- map: 83.63680381780644
- mrr: 86.16476190476192
- C-MTEB/CMedQAv2-reranking (Test Split)
- map: 83.74350667859487
- mrr: 86.10388888888889
- C-MTEB/Mmarco-reranking (Dev Split)
- map: 31.5372713650176
- mrr: 30.163095238095238
5. Retrieval Tasks
- C-MTEB/CmedqaRetrieval (Dev Split)
- map_at_1: 22.072
- map_at_10: 32.942
- map_at_100: 34.768
- map_at_1000: 34.902
- map_at_3: 29.357
- map_at_5: 31.236000000000004
- mrr_at_1: 34.259
- mrr_at_10: 41.957
- mrr_at_100: 42.982
- mrr_at_1000: 43.042
- mrr_at_3: 39.722
- mrr_at_5: 40.898
- ndcg_at_1: 34.259
- ndcg_at_10: 39.153
- ndcg_at_100: 46.493
- ndcg_at_1000: 49.01
- ndcg_at_3: 34.636
- ndcg_at_5: 36.278
- precision_at_1: 34.259
- precision_at_10: 8.815000000000001
- precision_at_100: 1.474
- precision_at_1000: 0.179
- precision_at_3: 19.73
- precision_at_5: 14.174000000000001
- recall_at_1: 22.072
- recall_at_10: 48.484
- recall_at_100: 79.035
- recall_at_1000: 96.15
- recall_at_3: 34.607
- recall_at_5: 40.064
- C-MTEB/CovidRetrieval (Dev Split)
- map_at_1: 69.178
- map_at_10: 77.523
- map_at_100: 77.793
- map_at_1000: 77.79899999999999
- map_at_3: 75.878
- map_at_5: 76.849
- mrr_at_1: 69.44200000000001
- mrr_at_10: 77.55
- mrr_at_100: 77.819
- mrr_at_1000: 77.826
- mrr_at_3: 75.957
- mrr_at_5: 76.916
- ndcg_at_1: 69.44200000000001
- ndcg_at_10: 81.217
- ndcg_at_100: 82.45
- ndcg_at_1000: 82.636
- ndcg_at_3: 77.931
- ndcg_at_5: 79.655
- precision_at_1: 69.44200000000001
- precision_at_10: 9.357
- precision_at_100: 0.993
- precision_at_1000: 0.101
- precision_at_3: 28.1
- precision_at_5: 17.724
- recall_at_1: 69.178
- recall_at_10: 92.624
- recall_at_100: 98.209
- recall_at_1000: 99.684
- recall_at_3: 83.772
- recall_at_5: 87.882
- C-MTEB/DuRetrieval (Dev Split)
- map_at_1: 25.163999999999998
- map_at_10: 76.386
- map_at_100: 79.339
- map_at_1000: 79.39500000000001
- map_at_3: 52.959
- map_at_5: 66.59
- mrr_at_1: 87.9
- mrr_at_10: 91.682
- mrr_at_100: 91.747
- mrr_at_1000: 91.751
- mrr_at_3: 91.267
- mrr_at_5: 91.527
- ndcg_at_1: 87.9
- ndcg_at_10: 84.569
- ndcg_at_100: 87.83800000000001
- ndcg_at_1000: 88.322
- ndcg_at_3: 83.473
- ndcg_at_5: 82.178
- precision_at_1: 87.9
- precision_at_10: 40.605000000000004
- precision_at_100: 4.752
- precision_at_1000: 0.488
- precision_at_3: 74.9
- precision_at_5: 62.96000000000001
- recall_at_1: 25.163999999999998
- recall_at_10: 85.97399999999999
- recall_at_100: 96.63000000000001
- recall_at_1000: 99.016
- recall_at_3: 55.611999999999995
- recall_at_5: 71.936
- C-MTEB/EcomRetrieval (Dev Split)
- map_at_1: 48.6
- map_at_10: 58.831
- map_at_100: 59.427
- map_at_1000: 59.44199999999999
- map_at_3: 56.383
- map_at_5: 57.753
- mrr_at_1: 48.6
- mrr_at_10: 58.831
- mrr_at_100: 59.427
- mrr_at_1000: 59.44199999999999
- mrr_at_3: 56.383
- mrr_at_5: 57.753
- ndcg_at_1: 48.6
- ndcg_at_10: 63.951
- ndcg_at_100: 66.72200000000001
- ndcg_at_1000: 67.13900000000001
- ndcg_at_3: 58.882
- ndcg_at_5: 61.373
- precision_at_1: 48.6
- precision_at_10: 8.01
- precision_at_100: 0.928
- precision_at_1000: 0.096
- precision_at_3: 22.033
- precision_at_5: 14.44
- recall_at_1: 48.6
- recall_at_10: 80.10000000000001
- recall_at_100: 92.80000000000001
- recall_at_1000: 96.1
- recall_at_3: 66.10000000000001
- recall_at_5: 72.2
- C-MTEB/MedicalRetrieval (Dev Split)
- map_at_1: 49.2
- map_at_10: 54.539
- map_at_100: 55.135
- map_at_1000: 55.19199999999999
- map_at_3: 53.383
- map_at_5: 54.142999999999994
- mrr_at_1: 49.2
- mrr_at_10: 54.539
- mrr_at_100: 55.135999999999996
- mrr_at_1000: 55.19199999999999
- mrr_at_3: 53.383
- mrr_at_5: 54.142999999999994
- ndcg_at_1: 49.2
- ndcg_at_10: 57.123000000000005
- ndcg_at_100: 60.21300000000001
- ndcg_at_1000: 61.915
- ndcg_at_3: 54.772
- ndcg_at_5: 56.157999999999994
- precision_at_1: 49.2
- precision_at_10: 6.52
- precision_at_100: 0.8009999999999999
- precision_at_1000: 0.094
- precision_at_3: 19.6
- precision_at_5: 12.44
- recall_at_1: 49.2
- recall_at_10: 65.2
- recall_at_100: 80.10000000000001
- recall_at_1000: 93.89999999999999
- recall_at_3: 58.8
- recall_at_5: 62.2
6. PairClassification Tasks
- C-MTEB/CMNLI (Validation Split)
- cos_sim_accuracy: 76.7047504509922
- cos_sim_ap: 85.26649874800871
- cos_sim_f1: 78.13528724646915
- cos_sim_precision: 71.57587548638132
- cos_sim_recall: 86.01823708206688
- dot_accuracy: 70.13830426939266
- dot_ap: 77.01510412382171
- dot_f1: 73.56710042713817
- dot_precision: 63.955094991364426
- dot_recall: 86.57937806873977
- euclidean_accuracy: 75.53818400481059
- euclidean_ap: 84.34668448241264
- euclidean_f1: 77.51741608613047
- euclidean_precision: 70.65614777756399
- euclidean_recall: 85.85457096095394
- manhattan_accuracy: 75.49007817197835
- manhattan_ap: 84.40297506704299
- manhattan_f1: 77.63185324160932
- manhattan_precision: 70.03949595636637
- manhattan_recall: 87.07037643207856
- max_accuracy: 76.7047504509922
- max_ap: 85.26649874800871
- max_f1: 78.13528724646915
- C-MTEB/OCNLI (Validation Split)
- cos_sim_accuracy: 75.69030860855442
- cos_sim_ap: 80.6157833772759
- cos_sim_f1: 77.87524366471735
- cos_sim_precision: 72.3076923076923
- cos_sim_recall: 84.37170010559663
- dot_accuracy: 67.78559826746074
- dot_ap: 72.00871467527499
- dot_f1: 72.58722247394654
- dot_precision: 63.57142857142857
- dot_recall: 84.58289334741288
- euclidean_accuracy: 75.20303194369248
- euclidean_ap: 80.98587256415605
- euclidean_f1: 77.26396917148362
- euclidean_precision: 71.03631532329496
- euclidean_recall: 84.68848996832101
- manhattan_accuracy: 75.20303194369248
- manhattan_ap: 80.93460699513219
- manhattan_f1: 77.124773960217
- manhattan_precision: 67.43083003952569
- manhattan_recall: 90.07391763463569
- max_accuracy: 75.69030860855442
- max_ap: 80.98587256415605
- max_f1: 77.87524366471735
📄 License
The model is licensed under the apache-2.0 license.