Xiaobu Embedding
X
Xiaobu Embedding
由lier007開發
xiaobu-embedding是一個多任務嵌入模型,支持多種中文自然語言處理任務,包括文本相似度計算、分類、聚類和檢索等。
下載量 147
發布時間 : 1/9/2024
模型概述
該模型專注於中文文本的嵌入表示學習,能夠生成高質量的文本向量,適用於多種下游任務,如語義相似度計算、文本分類和信息檢索等。
模型特點
多任務支持
支持多種自然語言處理任務,包括文本相似度計算、分類、聚類和檢索等。
中文優化
專門針對中文文本進行優化,能夠更好地捕捉中文語義特徵。
高性能
在多箇中文基準測試中表現出色,特別是在醫療領域相關任務上表現優異。
模型能力
文本嵌入
語義相似度計算
文本分類
文本聚類
信息檢索
重排序
使用案例
醫療領域
醫療問答檢索
用於醫療相關問題的檢索和回答
在CMedQA檢索任務上MAP@10達到37.604
醫療文檔重排序
對醫療相關文檔進行相關性重排序
在CMedQAv2重排序任務上MAP達到87.57
電子商務
商品評論分類
對商品評論進行情感和主題分類
在京東評論分類任務上準確率達到86.74%
商品檢索
電子商務平臺中的商品搜索功能
在EcomRetrieval任務上MAP@10達到63.14
通用NLP
文本相似度計算
計算兩段文本的語義相似度
在STSB任務上餘弦相似度皮爾遜達到79.75
文本分類
對文本進行多類別分類
在IFlyTek分類任務上準確率達到49.74%
🚀 xiaobu-embedding
xiaobu-embedding
模型基於 GTE 模型進行多任務微調。該模型使用了多種類型的數據進行訓練,包括閒聊類 Query-Query、知識類 Query-Doc 以及 BGE 開源 Query-Doc,並且對正例數據進行了清洗,挖掘了中等難度的負例,數據累計達 6M,更注重數據質量。
🚀 快速開始
安裝依賴
pip install -U sentence-transformers
相似度計算示例
from sentence_transformers import SentenceTransformer
sentences_1 = ["樣例數據-1", "樣例數據-2"]
sentences_2 = ["樣例數據-3", "樣例數據-4"]
model = SentenceTransformer('lier007/xiaobu-embedding')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
✨ 主要特性
- 模型基礎:基於 GTE 模型[1]進行多任務微調。
- 訓練數據:使用閒聊類 Query-Query、知識類 Query-Doc、BGE 開源 Query-Doc[2] 作為訓練數據;清洗正例,挖掘中等難度負例;累計 6M 數據,更注重數據質量。
📚 詳細文檔
評估
參考 BGE 中文 CMTEB 評估[2]。
微調
參考 BGE 微調模塊[2]。
📄 評估指標
任務類型 | 數據集 | 指標 | 值 |
---|---|---|---|
STS | C-MTEB/AFQMC | cos_sim_pearson | 49.37874132528482 |
STS | C-MTEB/AFQMC | cos_sim_spearman | 54.84722470052176 |
STS | C-MTEB/AFQMC | euclidean_pearson | 53.0495882931575 |
STS | C-MTEB/AFQMC | euclidean_spearman | 54.847727301700665 |
STS | C-MTEB/AFQMC | manhattan_pearson | 53.0632140838278 |
STS | C-MTEB/AFQMC | manhattan_spearman | 54.8744258024692 |
STS | C-MTEB/ATEC | cos_sim_pearson | 48.15992903013723 |
STS | C-MTEB/ATEC | cos_sim_spearman | 55.13198035464577 |
STS | C-MTEB/ATEC | euclidean_pearson | 55.435876753245715 |
STS | C-MTEB/ATEC | euclidean_spearman | 55.13215936702871 |
STS | C-MTEB/ATEC | manhattan_pearson | 55.41429518223402 |
STS | C-MTEB/ATEC | manhattan_spearman | 55.13363087679285 |
Classification | mteb/amazon_reviews_multi | accuracy | 46.722 |
Classification | mteb/amazon_reviews_multi | f1 | 45.039340641893205 |
STS | C-MTEB/BQ | cos_sim_pearson | 63.517830355554224 |
STS | C-MTEB/BQ | cos_sim_spearman | 65.57007801018649 |
STS | C-MTEB/BQ | euclidean_pearson | 64.05153340906585 |
STS | C-MTEB/BQ | euclidean_spearman | 65.5696865661119 |
STS | C-MTEB/BQ | manhattan_pearson | 63.95710619755406 |
STS | C-MTEB/BQ | manhattan_spearman | 65.48565785379489 |
Clustering | C-MTEB/CLSClusteringP2P | v_measure | 43.24046498507819 |
Clustering | C-MTEB/CLSClusteringS2S | v_measure | 41.22618199372116 |
Reranking | C-MTEB/CMedQAv1-reranking | map | 87.12213224673621 |
Reranking | C-MTEB/CMedQAv1-reranking | mrr | 89.57150793650794 |
Reranking | C-MTEB/CMedQAv2-reranking | map | 87.57290061886421 |
Reranking | C-MTEB/CMedQAv2-reranking | mrr | 90.19202380952382 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_1 | 25.22 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_10 | 37.604 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_100 | 39.501 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_1000 | 39.614 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_3 | 33.378 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_5 | 35.774 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_1 | 38.385000000000005 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_10 | 46.487 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_100 | 47.504999999999995 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_1000 | 47.548 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_3 | 43.885999999999996 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_5 | 45.373000000000005 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_1 | 38.385000000000005 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_10 | 44.224999999999994 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_100 | 51.637 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_1000 | 53.55799999999999 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_3 | 38.845 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_5 | 41.163 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_1 | 38.385000000000005 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_10 | 9.812 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_100 | 1.58 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_1000 | 0.183 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_3 | 21.88 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_5 | 15.974 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_1 | 25.22 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_10 | 54.897 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_100 | 85.469 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_1000 | 98.18599999999999 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_3 | 38.815 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_5 | 45.885 |
PairClassification | C-MTEB/CMNLI | cos_sim_accuracy | 83.22309079975948 |
PairClassification | C-MTEB/CMNLI | cos_sim_ap | 89.94833400328307 |
PairClassification | C-MTEB/CMNLI | cos_sim_f1 | 84.39319055464031 |
PairClassification | C-MTEB/CMNLI | cos_sim_precision | 79.5774647887324 |
PairClassification | C-MTEB/CMNLI | cos_sim_recall | 89.82931961655366 |
PairClassification | C-MTEB/CMNLI | dot_accuracy | 83.22309079975948 |
PairClassification | C-MTEB/CMNLI | dot_ap | 89.95618559578415 |
PairClassification | C-MTEB/CMNLI | dot_f1 | 84.41173239591345 |
PairClassification | C-MTEB/CMNLI | dot_precision | 79.61044343141317 |
PairClassification | C-MTEB/CMNLI | dot_recall | 89.82931961655366 |
PairClassification | C-MTEB/CMNLI | euclidean_accuracy | 83.23511725796753 |
PairClassification | C-MTEB/CMNLI | euclidean_ap | 89.94836342787318 |
PairClassification | C-MTEB/CMNLI | euclidean_f1 | 84.40550133096718 |
PairClassification | C-MTEB/CMNLI | euclidean_precision | 80.29120067524794 |
PairClassification | C-MTEB/CMNLI | euclidean_recall | 88.9642272620996 |
PairClassification | C-MTEB/CMNLI | manhattan_accuracy | 83.23511725796753 |
PairClassification | C-MTEB/CMNLI | manhattan_ap | 89.9450103956978 |
PairClassification | C-MTEB/CMNLI | manhattan_f1 | 84.44444444444444 |
PairClassification | C-MTEB/CMNLI | manhattan_precision | 80.09647651006712 |
PairClassification | C-MTEB/CMNLI | manhattan_recall | 89.29155950432546 |
PairClassification | C-MTEB/CMNLI | max_accuracy | 83.23511725796753 |
PairClassification | C-MTEB/CMNLI | max_ap | 89.95618559578415 |
PairClassification | C-MTEB/CMNLI | max_f1 | 84.44444444444444 |
Retrieval | C-MTEB/CovidRetrieval | map_at_1 | 76.87 |
Retrieval | C-MTEB/CovidRetrieval | map_at_10 | 84.502 |
Retrieval | C-MTEB/CovidRetrieval | map_at_100 | 84.615 |
Retrieval | C-MTEB/CovidRetrieval | map_at_1000 | 84.617 |
Retrieval | C-MTEB/CovidRetrieval | map_at_3 | 83.127 |
Retrieval | C-MTEB/CovidRetrieval | map_at_5 | 83.99600000000001 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_1 | 77.02799999999999 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_10 | 84.487 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_100 | 84.59299999999999 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_1000 | 84.59400000000001 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_3 | 83.193 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_5 | 83.994 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_1 | 77.134 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_10 | 87.68599999999999 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_100 | 88.17099999999999 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_1000 | 88.21 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_3 | 84.993 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_5 | 86.519 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_1 | 77.134 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_10 | 9.841999999999999 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_100 | 1.006 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_1000 | 0.101 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_3 | 30.313000000000002 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_5 | 18.945999999999998 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_1 | 76.87 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_10 | 97.418 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_100 | 99.579 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_1000 | 99.895 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_3 | 90.227 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_5 | 93.888 |
Retrieval | C-MTEB/DuRetrieval | map_at_1 | 25.941 |
Retrieval | C-MTEB/DuRetrieval | map_at_10 | 78.793 |
Retrieval | C-MTEB/DuRetrieval | map_at_100 | 81.57799999999999 |
Retrieval | C-MTEB/DuRetrieval | map_at_1000 | 81.626 |
Retrieval | C-MTEB/DuRetrieval | map_at_3 | 54.749 |
Retrieval | C-MTEB/DuRetrieval | map_at_5 | 69.16 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_1 | 90.45 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_10 | 93.406 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_100 | 93.453 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_1000 | 93.45700000000001 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_3 | 93.10000000000001 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_5 | 93.27499999999999 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_1 | 90.45 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_10 | 86.44500000000001 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_100 | 89.28399999999999 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_1000 | 89.739 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_3 | 85.62100000000001 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_5 | 84.441 |
Retrieval | C-MTEB/DuRetrieval | precision_at_1 | 90.45 |
Retrieval | C-MTEB/DuRetrieval | precision_at_10 | 41.19 |
Retrieval | C-MTEB/DuRetrieval | precision_at_100 | 4.761 |
Retrieval | C-MTEB/DuRetrieval | precision_at_1000 | 0.48700000000000004 |
Retrieval | C-MTEB/DuRetrieval | precision_at_3 | 76.583 |
Retrieval | C-MTEB/DuRetrieval | precision_at_5 | 64.68 |
Retrieval | C-MTEB/DuRetrieval | recall_at_1 | 25.941 |
Retrieval | C-MTEB/DuRetrieval | recall_at_10 | 87.443 |
Retrieval | C-MTEB/DuRetrieval | recall_at_100 | 96.54 |
Retrieval | C-MTEB/DuRetrieval | recall_at_1000 | 98.906 |
Retrieval | C-MTEB/DuRetrieval | recall_at_3 | 56.947 |
Retrieval | C-MTEB/DuRetrieval | recall_at_5 | 73.714 |
Retrieval | C-MTEB/EcomRetrieval | map_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | map_at_10 | 63.144 |
Retrieval | C-MTEB/EcomRetrieval | map_at_100 | 63.634 |
Retrieval | C-MTEB/EcomRetrieval | map_at_1000 | 63.644999999999996 |
Retrieval | C-MTEB/EcomRetrieval | map_at_3 | 60.817 |
Retrieval | C-MTEB/EcomRetrieval | map_at_5 | 62.202 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_10 | 63.144 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_100 | 63.634 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_1000 | 63.644999999999996 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_3 | 60.817 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_5 | 62.202 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_10 | 68.042 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_100 | 70.417 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_1000 | 70.722 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_3 | 63.287000000000006 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_5 | 65.77 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_10 | 8.34 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_100 | 0.9450000000000001 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_1000 | 0.097 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_3 | 23.467 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_5 | 15.28 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_10 | 83.39999999999999 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_100 | 94.5 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_1000 | 96.89999999999999 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_3 | 70.39999999999999 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_5 | 76.4 |
Classification | C-MTEB/IFlyTek-classification | accuracy | 49.74220854174683 |
Classification | C-MTEB/IFlyTek-classification | f1 | 38.01399980618159 |
Classification | C-MTEB/JDReview-classification | accuracy | 86.73545966228893 |
Classification | C-MTEB/JDReview-classification | ap | 55.72394235169542 |
Classification | C-MTEB/JDReview-classification | f1 | 81.58550390953492 |
STS | C-MTEB/LCQMC | cos_sim_pearson | 69.96711977441642 |
STS | C-MTEB/LCQMC | cos_sim_spearman | 75.54747609685569 |
STS | C-MTEB/LCQMC | euclidean_pearson | 74.62663478056035 |
STS | C-MTEB/LCQMC | euclidean_spearman | 75.54761576699639 |
STS | C-MTEB/LCQMC | manhattan_pearson | 74.60983904582241 |
STS | C-MTEB/LCQMC | manhattan_spearman | 75.52758938061503 |
Reranking | C-MTEB/Mmarco-reranking | map | 28.076927649720986 |
Reranking | C-MTEB/Mmarco-reranking | mrr | 26.98015873015873 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_1 | 65.58 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_10 | 74.763 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_100 | 75.077 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_1000 | 75.091 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_3 | 72.982 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_5 | 74.155 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_1 | 67.822 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_10 | 75.437 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_100 | 75.702 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_1000 | 75.715 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_3 | 73.91799999999999 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_5 | 74.909 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_1 | 67.822 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_10 | 78.472 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_100 | 79.891 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_1000 | 80.262 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_3 | 75.138 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_5 | 77.094 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_1 | 67.822 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_10 | 9.474 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_100 | 1.019 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_1000 | 0.105 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_3 | 28.281 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_5 | 18.017 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_1 | 65.58 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_10 | 89.18599999999999 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_100 | 95.64399999999999 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_1000 | 98.541 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_3 | 80.455 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_5 | 85.063 |
Classification | mteb/amazon_massive_intent | accuracy | 72.86819098856758 |
Classification | mteb/amazon_massive_intent | f1 | 70.25369778283451 |
Classification | mteb/amazon_massive_scenario | accuracy | 75.46738399462004 |
Classification | mteb/amazon_massive_scenario | f1 | 75.02466838130249 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_1 | 53.300000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_10 | 60.072 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_100 | 60.618 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_1000 | 60.659 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_3 | 58.550000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_5 | 59.425 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_1 | 53.5 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_10 | 60.187999999999995 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_100 | 60.73499999999999 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_1000 | 60.775999999999996 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_3 | 58.667 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_5 | 59.541999999999994 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_1 | 53.300000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_10 | 63.376999999999995 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_100 | 66.24600000000001 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_1000 | 67.408 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_3 | 60.211000000000006 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_5 | 61.781 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_1 | 53.300000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_10 | 7.380000000000001 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_100 | 0.877 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_1000 | 0.097 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_3 | 21.667 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_5 | 13.76 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_1 | 53.300000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_10 | 73.8 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_100 | 87.7 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_1000 | 97.0 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_3 | 65.0 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_5 | 68.8 |
Classification | C-MTEB/MultilingualSentiment-classification | accuracy | 76.27666666666667 |
Classification | C-MTEB/MultilingualSentiment-classification | f1 | 76.31280038435165 |
PairClassification | C-MTEB/OCNLI | cos_sim_accuracy | 78.72225230102869 |
PairClassification | C-MTEB/OCNLI | cos_sim_ap | 80.63941899467723 |
PairClassification | C-MTEB/OCNLI | cos_sim_f1 | 80.52190121155638 |
PairClassification | C-MTEB/OCNLI | cos_sim_precision | 72.06005004170142 |
PairClassification | C-MTEB/OCNLI | cos_sim_recall | 91.23548046462513 |
PairClassification | C-MTEB/OCNLI | dot_accuracy | 78.72225230102869 |
PairClassification | C-MTEB/OCNLI | dot_ap | 80.63913939812744 |
PairClassification | C-MTEB/OCNLI | dot_f1 | 80.51948051948052 |
PairClassification | C-MTEB/OCNLI | dot_precision | 71.7948717948718 |
PairClassification | C-MTEB/OCNLI | dot_recall | 91.65786694825766 |
PairClassification | C-MTEB/OCNLI | euclidean_accuracy | 78.72225230102869 |
PairClassification | C-MTEB/OCNLI | euclidean_ap | 80.64403797436798 |
PairClassification | C-MTEB/OCNLI | euclidean_f1 | 80.52190121155638 |
PairClassification | C-MTEB/OCNLI | euclidean_precision | 72.06005004170142 |
PairClassification | C-MTEB/OCNLI | euclidean_recall | 91.23548046462513 |
PairClassification | C-MTEB/OCNLI | manhattan_accuracy | 78.18083378451544 |
PairClassification | C-MTEB/OCNLI | manhattan_ap | 80.5241189302444 |
PairClassification | C-MTEB/OCNLI | manhattan_f1 | 80.43478260869566 |
PairClassification | C-MTEB/OCNLI | manhattan_precision | 72.7972626176219 |
PairClassification | C-MTEB/OCNLI | manhattan_recall | 89.86272439281943 |
PairClassification | C-MTEB/OCNLI | max_accuracy | 78.72225230102869 |
PairClassification | C-MTEB/OCNLI | max_ap | 80.64403797436798 |
PairClassification | C-MTEB/OCNLI | max_f1 | 80.52190121155638 |
Classification | C-MTEB/OnlineShopping-classification | accuracy | 92.49000000000001 |
Classification | C-MTEB/OnlineShopping-classification | ap | 90.66330807324402 |
Classification | C-MTEB/OnlineShopping-classification | f1 | 92.48245049107115 |
STS | C-MTEB/PAWSX | cos_sim_pearson | 33.6275431596535 |
STS | C-MTEB/PAWSX | cos_sim_spearman | 37.865700050451494 |
STS | C-MTEB/PAWSX | euclidean_pearson | 38.1050665279388 |
STS | C-MTEB/PAWSX | euclidean_spearman | 37.864125056066364 |
STS | C-MTEB/PAWSX | manhattan_pearson | 38.11206873232881 |
STS | C-MTEB/PAWSX | manhattan_spearman | 37.852977098473936 |
STS | C-MTEB/QBQTC | cos_sim_pearson | 32.137955501231104 |
STS | C-MTEB/QBQTC | cos_sim_spearman | 33.68610910423116 |
STS | C-MTEB/QBQTC | euclidean_pearson | 32.155444753547926 |
STS | C-MTEB/QBQTC | euclidean_spearman | 33.685799252964124 |
STS | C-MTEB/QBQTC | manhattan_pearson | 32.14490855334317 |
STS | C-MTEB/QBQTC | manhattan_spearman | 33.656549820048554 |
STS | mteb/sts22-crosslingual-sts | cos_sim_pearson | 63.63884916818661 |
STS | mteb/sts22-crosslingual-sts | cos_sim_spearman | 64.3217581571435 |
STS | mteb/sts22-crosslingual-sts | euclidean_pearson | 63.475760085926055 |
STS | mteb/sts22-crosslingual-sts | euclidean_spearman | 64.31638169371887 |
STS | mteb/sts22-crosslingual-sts | manhattan_pearson | 64.39677572604752 |
STS | mteb/sts22-crosslingual-sts | manhattan_spearman | 64.85585019406021 |
STS | C-MTEB/STSB | cos_sim_pearson | 79.74698333415277 |
STS | C-MTEB/STSB | cos_sim_spearman | 81.1850043859317 |
STS | C-MTEB/STSB | euclidean_pearson | 80.94512578669881 |
STS | C-MTEB/STSB | euclidean_spearman | 81.18825478390181 |
STS | C-MTEB/STSB | manhattan_pearson | 80.88114336824758 |
STS | C-MTEB/STSB | manhattan_spearman | 81.12266715583868 |
Reranking | C-MTEB/T2Reranking | map | 66.59971552953814 |
Reranking | C-MTEB/T2Reranking | mrr | 76.42177408088038 |
Retrieval | C-MTEB/T2Retrieval | map_at_1 | 28.825 |
Retrieval | C-MTEB/T2Retrieval | map_at_10 | 77.48899999999999 |
Retrieval | C-MTEB/T2Retrieval | map_at_100 | 81.144 |
Retrieval | C-MTEB/T2Retrieval | map_at_1000 | 81.216 |
Retrieval | C-MTEB/T2Retrieval | map_at_3 | 55.435 |
Retrieval | C-MTEB/T2Retrieval | map_at_5 | 67.496 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_1 | 91.377 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_10 | 94.062 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_100 | 94.122 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_1000 | 94.123 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_3 | 93.709 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_5 | 93.932 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_1 | 91.377 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_10 | 85.44800000000001 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_100 | 89.11099999999999 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_1000 | 89.752 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_3 | 87.262 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_5 | 85.668 |
Retrieval | C-MTEB/T2Retrieval | precision_at_1 | 91.377 |
Retrieval | C-MTEB/T2Retrieval | precision_at_10 | 41.525 |
Retrieval | C-MTEB/T2Retrieval | precision_at_100 | 4.989 |
Retrieval | C-MTEB/T2Retrieval | precision_at_1000 | 0.516 |
Retrieval | C-MTEB/T2Retrieval | precision_at_3 | 75.452 |
Retrieval | C-MTEB/T2Retrieval | precision_at_5 | 62.785000000000004 |
Retrieval | C-MTEB/T2Retrieval | recall_at_1 | 28.825 |
Retrieval | C-MTEB/T2Retrieval | recall_at_10 | 84.202 |
Retrieval | C-MTEB/T2Retrieval | recall_at_100 | 95.768 |
Retrieval | C-MTEB/T2Retrieval | recall_at_1000 | 98.791 |
Retrieval | C-MTEB/T2Retrieval | recall_at_3 | 57.284 |
Retrieval | C-MTEB/T2Retrieval | recall_at_5 | 71.071 |
Classification | C-MTEB/TNews-classification | accuracy | 52.160000000000004 |
Classification | C-MTEB/TNews-classification | f1 | 50.49492950548829 |
Clustering | C-MTEB/ThuNewsClusteringP2P | v_measure | 70.06019845009966 |
Clustering | C-MTEB/ThuNewsClusteringS2S | v_measure | 63.9370959228245 |
Retrieval | C-MTEB/VideoRetrieval | map_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | map_at_10 | 69.362 |
Retrieval | C-MTEB/VideoRetrieval | map_at_100 | 69.819 |
Retrieval | C-MTEB/VideoRetrieval | map_at_1000 | 69.833 |
Retrieval | C-MTEB/VideoRetrieval | map_at_3 | 67.783 |
Retrieval | C-MTEB/VideoRetrieval | map_at_5 | 68.71300000000001 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_10 | 69.362 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_100 | 69.819 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_1000 | 69.833 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_3 | 67.783 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_5 | 68.71300000000001 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_10 | 73.59400000000001 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_100 | 75.734 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_1000 | 76.049 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_3 | 70.33 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_5 | 72.033 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_10 | 8.67 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_100 | 0.9650000000000001 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_1000 | 0.099 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_3 | 25.900000000000002 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_5 | 16.38 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_10 | 86.7 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_100 | 96.5 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_1000 | 98.9 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_3 | 77.7 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_5 | 81.89999999999999 |
Classification | C-MTEB/waimai-classification | accuracy | 88.36 |
Classification | C-MTEB/waimai-classification | ap | 73.25144216855439 |
Classification | C-MTEB/waimai-classification | f1 | 86.75076261442027 |
📄 參考資料
Jina Embeddings V3
Jina Embeddings V3 是一個多語言句子嵌入模型,支持超過100種語言,專注於句子相似度和特徵提取任務。
文本嵌入
Transformers 支持多種語言

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
基於MS Marco段落排序任務訓練的交叉編碼器模型,用於信息檢索中的查詢-段落相關性評分
文本嵌入 英語
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
基於蒸餾技術的稀疏檢索模型,專為OpenSearch優化,支持免推理文檔編碼,在搜索相關性和效率上優於V1版本
文本嵌入
Transformers 英語

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
基於PubMedBERT的生物醫學實體表徵模型,通過自對齊預訓練優化語義關係捕捉
文本嵌入 英語
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large 是一個強大的句子轉換器模型,專注於句子相似度和文本嵌入任務,在多個基準測試中表現出色。
文本嵌入 英語
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 是一個英文句子轉換器模型,專注於句子相似度任務,在多個文本嵌入基準測試中表現優異。
文本嵌入
Transformers 支持多種語言

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base 是一個多語言的句子嵌入模型,支持超過50種語言,適用於句子相似度計算等任務。
文本嵌入
Transformers 支持多種語言

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT是一個化學語言模型,旨在實現完全由機器驅動的超快聚合物信息學。它將PSMILES字符串映射為600維密集指紋,以數值形式表示聚合物化學結構。
文本嵌入
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
基於土耳其語BERT的句子嵌入模型,專為語義相似度任務優化
文本嵌入
Transformers 其他

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
基於BAAI/bge-small-en-v1.5模型微調的文本嵌入模型,通過MEDI數據集與MTEB分類任務數據集訓練,優化了檢索任務的查詢編碼能力。
文本嵌入
Safetensors 英語
G
avsolatorio
945.68k
29
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98