Tao 8k
模型概述
該模型主要用於計算句子之間的相似度,支持餘弦相似度、歐氏距離和曼哈頓距離等多種度量方式,適用於文本匹配、信息檢索等場景。
模型特點
多度量相似度計算
支持餘弦相似度、歐氏距離和曼哈頓距離等多種相似度度量方式。
中文優化
針對中文文本進行了優化,在多箇中文評測數據集上表現優異。
廣泛的任務支持
支持句子相似度、分類、聚類、重排序和檢索等多種任務。
模型能力
句子相似度計算
文本分類
文本聚類
信息檢索
重排序
使用案例
文本匹配
問答匹配
用於匹配問題和答案的相似度,提升問答系統的準確性。
在 CMedQAv1 和 CMedQAv2 數據集上平均準確率超過 85%。
語義相似度計算
計算兩個句子的語義相似度,用於推薦系統或搜索排序。
在 AFQMC 和 ATEC 數據集上表現出色。
信息檢索
文檔檢索
用於檢索與查詢最相關的文檔。
在 CmedqaRetrieval 和 CovidRetrieval 數據集上表現良好。
🚀 tao-8k-origin模型
tao-8k-origin
是一個用於句子相似度計算的模型,它在多個自然語言處理任務和數據集上進行了評估,展現出了一定的性能表現。該模型在上下文長度上進行了擴展嘗試,後續還計劃推出結構不同的tao-v2
版本。
🚀 快速開始
模型用途
本模型主要用於句子相似度相關任務,如文本匹配、聚類、分類等。
模型特點
- 與
stella-v2
方法類似,但在tao
上擴展了上下文長度。 - 若要使用完整的8k上下文,可能需要將模型轉換為
float32
。
後續計劃
正在開發tao-v2
,它將具有不同的結構,並會盡快發佈。
✨ 主要特性
- 上下文擴展:在
tao
基礎上擴展了上下文長度,以處理更長的文本。 - 多任務支持:在多個自然語言處理任務上進行了測試,包括語義文本相似度(STS)、分類、聚類、重排序和檢索等。
📚 詳細文檔
模型信息
屬性 | 詳情 |
---|---|
模型類型 | 句子相似度模型 |
訓練數據 | 未提及 |
評估結果
以下是模型在不同數據集和任務上的評估指標:
語義文本相似度(STS)任務
數據集 | 指標類型 | 數值 |
---|---|---|
C-MTEB/AFQMC | cos_sim_pearson | 47.33644889578121 |
C-MTEB/AFQMC | cos_sim_spearman | 49.93968642502866 |
C-MTEB/AFQMC | euclidean_pearson | 48.12029792973887 |
C-MTEB/AFQMC | euclidean_spearman | 49.939666315145494 |
C-MTEB/AFQMC | manhattan_pearson | 48.07449594650583 |
C-MTEB/AFQMC | manhattan_spearman | 49.892461433911166 |
C-MTEB/ATEC | cos_sim_pearson | 50.976148098905746 |
C-MTEB/ATEC | cos_sim_spearman | 53.11230114448237 |
C-MTEB/ATEC | euclidean_pearson | 55.119977161851054 |
C-MTEB/ATEC | euclidean_spearman | 53.11229776647941 |
C-MTEB/ATEC | manhattan_pearson | 55.096968162828034 |
C-MTEB/ATEC | manhattan_spearman | 53.107481302419465 |
C-MTEB/BQ | cos_sim_pearson | 62.843816050026824 |
C-MTEB/BQ | cos_sim_spearman | 65.54142642656706 |
C-MTEB/BQ | euclidean_pearson | 64.08809634876388 |
C-MTEB/BQ | euclidean_spearman | 65.54142642558392 |
C-MTEB/BQ | manhattan_pearson | 64.09391522108272 |
C-MTEB/BQ | manhattan_spearman | 65.55445491162718 |
C-MTEB/LCQMC | cos_sim_pearson | 71.17874301231225 |
C-MTEB/LCQMC | cos_sim_spearman | 77.47936067899236 |
C-MTEB/LCQMC | euclidean_pearson | 76.3241109984839 |
C-MTEB/LCQMC | euclidean_spearman | 77.47936511149533 |
C-MTEB/LCQMC | manhattan_pearson | 76.3334642249198 |
C-MTEB/LCQMC | manhattan_spearman | 77.48889610190774 |
C-MTEB/PAWSX | cos_sim_pearson | 30.06960208048424 |
C-MTEB/PAWSX | cos_sim_spearman | 36.21568893707218 |
C-MTEB/PAWSX | euclidean_pearson | 36.3789158810154 |
C-MTEB/PAWSX | euclidean_spearman | 36.21568740241203 |
C-MTEB/PAWSX | manhattan_pearson | 36.318190228955935 |
C-MTEB/PAWSX | manhattan_spearman | 36.16813420759451 |
C-MTEB/QBQTC | cos_sim_pearson | 36.779942621488736 |
C-MTEB/QBQTC | cos_sim_spearman | 38.73716529566492 |
C-MTEB/QBQTC | euclidean_pearson | 37.134107612179605 |
C-MTEB/QBQTC | euclidean_spearman | 38.737099842399545 |
C-MTEB/QBQTC | manhattan_pearson | 37.17579625045808 |
C-MTEB/QBQTC | manhattan_spearman | 38.746051563332315 |
mteb/sts22-crosslingual-sts | cos_sim_pearson | 65.97416499132073 |
mteb/sts22-crosslingual-sts | cos_sim_spearman | 68.87894646940939 |
mteb/sts22-crosslingual-sts | euclidean_pearson | 67.2366929400408 |
mteb/sts22-crosslingual-sts | euclidean_spearman | 68.87894646940939 |
mteb/sts22-crosslingual-sts | manhattan_pearson | 67.30590304353478 |
mteb/sts22-crosslingual-sts | manhattan_spearman | 68.90546655032796 |
C-MTEB/STSB | cos_sim_pearson | 78.99420906581649 |
C-MTEB/STSB | cos_sim_spearman | 79.36553449000968 |
C-MTEB/STSB | euclidean_pearson | 78.77734144763518 |
C-MTEB/STSB | euclidean_spearman | 79.36545230850567 |
C-MTEB/STSB | manhattan_pearson | 78.82512507141092 |
C-MTEB/STSB | manhattan_spearman | 79.43977311125059 |
分類任務
數據集 | 指標類型 | 數值 |
---|---|---|
mteb/amazon_reviews_multi | accuracy | 40.804 |
mteb/amazon_reviews_multi | f1 | 39.01066543513968 |
C-MTEB/IFlyTek-classification | accuracy | 47.98768757214313 |
C-MTEB/IFlyTek-classification | f1 | 35.24243089488371 |
C-MTEB/JDReview-classification | accuracy | 87.01688555347093 |
C-MTEB/JDReview-classification | ap | 56.39167630414159 |
C-MTEB/JDReview-classification | f1 | 81.91756262306008 |
mteb/amazon_massive_intent | accuracy | 68.2649630127774 |
mteb/amazon_massive_intent | f1 | 65.96868218344183 |
mteb/amazon_massive_scenario | accuracy | 73.13382649630127 |
mteb/amazon_massive_scenario | f1 | 72.69980239148315 |
C-MTEB/MultilingualSentiment-classification | accuracy | 73.39 |
C-MTEB/MultilingualSentiment-classification | f1 | 72.85739851837214 |
C-MTEB/OnlineShopping-classification | accuracy | 91.81000000000002 |
C-MTEB/OnlineShopping-classification | ap | 89.35809579688139 |
C-MTEB/OnlineShopping-classification | f1 | 91.79220350456818 |
C-MTEB/TNews-classification | accuracy | 51.927 |
C-MTEB/TNews-classification | f1 | 50.16838216110367 |
C-MTEB/waimai-classification | accuracy | 87.10000000000001 |
C-MTEB/waimai-classification | ap | 70.81766065881429 |
C-MTEB/waimai-classification | f1 | 85.5323306120456 |
聚類任務
數據集 | 指標類型 | 數值 |
---|---|---|
C-MTEB/CLSClusteringP2P | v_measure | 40.028061591547804 |
C-MTEB/CLSClusteringS2S | v_measure | 38.1897102944254 |
C-MTEB/ThuNewsClusteringP2P | v_measure | 60.85131720842154 |
C-MTEB/ThuNewsClusteringS2S | v_measure | 57.0921610946628 |
重排序任務
數據集 | 指標類型 | 數值 |
---|---|---|
C-MTEB/CMedQAv1-reranking | map | 85.34294439514511 |
C-MTEB/CMedQAv1-reranking | mrr | 88.03849206349206 |
C-MTEB/CMedQAv2-reranking | map | 85.81294364673899 |
C-MTEB/CMedQAv2-reranking | mrr | 88.52146825396825 |
C-MTEB/Mmarco-reranking | map | 27.96872431410137 |
C-MTEB/Mmarco-reranking | mrr | 26.92023809523809 |
C-MTEB/T2Reranking | map | 66.38018284846501 |
C-MTEB/T2Reranking | mrr | 76.11180965277104 |
檢索任務
數據集 | 指標類型 | 數值 |
---|---|---|
C-MTEB/CmedqaRetrieval | map_at_1 | 23.982 |
C-MTEB/CmedqaRetrieval | map_at_10 | 36.21 |
C-MTEB/CmedqaRetrieval | map_at_100 | 38.072 |
C-MTEB/CmedqaRetrieval | map_at_1000 | 38.194 |
C-MTEB/CmedqaRetrieval | map_at_3 | 32.239000000000004 |
C-MTEB/CmedqaRetrieval | map_at_5 | 34.377 |
C-MTEB/CmedqaRetrieval | mrr_at_1 | 36.858999999999995 |
C-MTEB/CmedqaRetrieval | mrr_at_10 | 45.084999999999994 |
C-MTEB/CmedqaRetrieval | mrr_at_100 | 46.104 |
C-MTEB/CmedqaRetrieval | mrr_at_1000 | 46.154 |
C-MTEB/CmedqaRetrieval | mrr_at_3 | 42.623 |
C-MTEB/CmedqaRetrieval | mrr_at_5 | 43.995 |
C-MTEB/CmedqaRetrieval | ndcg_at_1 | 36.858999999999995 |
C-MTEB/CmedqaRetrieval | ndcg_at_10 | 42.735 |
C-MTEB/CmedqaRetrieval | ndcg_at_100 | 50.181 |
C-MTEB/CmedqaRetrieval | ndcg_at_1000 | 52.309000000000005 |
C-MTEB/CmedqaRetrieval | ndcg_at_3 | 37.728 |
C-MTEB/CmedqaRetrieval | ndcg_at_5 | 39.664 |
C-MTEB/CmedqaRetrieval | precision_at_1 | 36.858999999999995 |
C-MTEB/CmedqaRetrieval | precision_at_10 | 9.615 |
C-MTEB/CmedqaRetrieval | precision_at_100 | 1.564 |
C-MTEB/CmedqaRetrieval | precision_at_1000 | 0.183 |
C-MTEB/CmedqaRetrieval | precision_at_3 | 21.514 |
C-MTEB/CmedqaRetrieval | precision_at_5 | 15.568999999999999 |
C-MTEB/CmedqaRetrieval | recall_at_1 | 23.982 |
C-MTEB/CmedqaRetrieval | recall_at_10 | 53.04600000000001 |
C-MTEB/CmedqaRetrieval | recall_at_100 | 84.113 |
C-MTEB/CmedqaRetrieval | recall_at_1000 | 98.37 |
C-MTEB/CmedqaRetrieval | recall_at_3 | 37.824999999999996 |
C-MTEB/CmedqaRetrieval | recall_at_5 | 44.023 |
C-MTEB/CovidRetrieval | map_at_1 | 70.706 |
C-MTEB/CovidRetrieval | map_at_10 | 78.619 |
C-MTEB/CovidRetrieval | map_at_100 | 78.915 |
C-MTEB/CovidRetrieval | map_at_1000 | 78.918 |
C-MTEB/CovidRetrieval | map_at_3 | 76.967 |
C-MTEB/CovidRetrieval | map_at_5 | 77.922 |
C-MTEB/CovidRetrieval | mrr_at_1 | 70.917 |
C-MTEB/CovidRetrieval | mrr_at_10 | 78.64 |
C-MTEB/CovidRetrieval | mrr_at_100 | 78.935 |
C-MTEB/CovidRetrieval | mrr_at_1000 | 78.938 |
C-MTEB/CovidRetrieval | mrr_at_3 | 77.081 |
C-MTEB/CovidRetrieval | mrr_at_5 | 77.972 |
C-MTEB/CovidRetrieval | ndcg_at_1 | 70.917 |
C-MTEB/CovidRetrieval | ndcg_at_10 | 82.186 |
C-MTEB/CovidRetrieval | ndcg_at_100 | 83.487 |
C-MTEB/CovidRetrieval | ndcg_at_1000 | 83.589 |
C-MTEB/CovidRetrieval | ndcg_at_3 | 78.874 |
C-MTEB/CovidRetrieval | ndcg_at_5 | 80.548 |
C-MTEB/CovidRetrieval | precision_at_1 | 70.917 |
C-MTEB/CovidRetrieval | precision_at_10 | 9.431000000000001 |
C-MTEB/CovidRetrieval | precision_at_100 | 1.001 |
C-MTEB/CovidRetrieval | precision_at_1000 | 0.101 |
C-MTEB/CovidRetrieval | precision_at_3 | 28.275 |
C-MTEB/CovidRetrieval | precision_at_5 | 17.829 |
C-MTEB/CovidRetrieval | recall_at_1 | 70.706 |
C-MTEB/CovidRetrieval | recall_at_10 | 93.256 |
C-MTEB/CovidRetrieval | recall_at_100 | 99.05199999999999 |
C-MTEB/CovidRetrieval | recall_at_1000 | 99.895 |
C-MTEB/CovidRetrieval | recall_at_3 | 84.247 |
C-MTEB/CovidRetrieval | recall_at_5 | 88.251 |
C-MTEB/DuRetrieval | map_at_1 | 25.989 |
C-MTEB/DuRetrieval | map_at_10 | 80.882 |
C-MTEB/DuRetrieval | map_at_100 | 83.63199999999999 |
C-MTEB/DuRetrieval | map_at_1000 | 83.663 |
C-MTEB/DuRetrieval | map_at_3 | 55.772 |
C-MTEB/DuRetrieval | map_at_5 | 70.598 |
C-MTEB/DuRetrieval | mrr_at_1 | 90.14999999999999 |
C-MTEB/DuRetrieval | mrr_at_10 | 93.30000000000001 |
C-MTEB/DuRetrieval | mrr_at_100 | 93.363 |
C-MTEB/DuRetrieval | mrr_at_1000 | 93.366 |
C-MTEB/DuRetrieval | mrr_at_3 | 93.083 |
C-MTEB/DuRetrieval | mrr_at_5 | 93.206 |
C-MTEB/DuRetrieval | ndcg_at_1 | 90.14999999999999 |
C-MTEB/DuRetrieval | ndcg_at_10 | 88.016 |
C-MTEB/DuRetrieval | ndcg_at_100 | 90.52900000000001 |
C-MTEB/DuRetrieval | ndcg_at_1000 | 90.84400000000001 |
C-MTEB/DuRetrieval | ndcg_at_3 | 86.529 |
C-MTEB/DuRetrieval | ndcg_at_5 | 85.65899999999999 |
C-MTEB/DuRetrieval | precision_at_1 | 90.14999999999999 |
C-MTEB/DuRetrieval | precision_at_10 | 42.295 |
C-MTEB/DuRetrieval | precision_at_100 | 4.826 |
C-MTEB/DuRetrieval | precision_at_1000 | 0.48900000000000005 |
C-MTEB/DuRetrieval | precision_at_3 | 77.717 |
C-MTEB/DuRetrieval | precision_at_5 | 65.81 |
C-MTEB/DuRetrieval | recall_at_1 | 25.989 |
C-MTEB/DuRetrieval | recall_at_10 | 89.446 |
C-MTEB/DuRetrieval | recall_at_100 | 97.832 |
C-MTEB/DuRetrieval | recall_at_1000 | 99.568 |
C-MTEB/DuRetrieval | recall_at_3 | 58.223 |
C-MTEB/DuRetrieval | recall_at_5 | 75.411 |
C-MTEB/EcomRetrieval | map_at_1 | 49.6 |
C-MTEB/EcomRetrieval | map_at_10 | 59.512 |
C-MTEB/EcomRetrieval | map_at_100 | 60.059 |
C-MTEB/EcomRetrieval | map_at_1000 | 60.077999999999996 |
C-MTEB/EcomRetrieval | map_at_3 | 56.882999999999996 |
C-MTEB/EcomRetrieval | map_at_5 | 58.298 |
C-MTEB/EcomRetrieval | mrr_at_1 | 49.6 |
C-MTEB/EcomRetrieval | mrr_at_10 | 59.512 |
C-MTEB/EcomRetrieval | mrr_at_100 | 60.059 |
C-MTEB/EcomRetrieval | mrr_at_1000 | 60.077999999999996 |
C-MTEB/EcomRetrieval | mrr_at_3 | 56.882999999999996 |
C-MTEB/EcomRetrieval | mrr_at_5 | 58.298 |
C-MTEB/EcomRetrieval | ndcg_at_1 | 49.6 |
C-MTEB/EcomRetrieval | ndcg_at_10 | 64.71000000000001 |
C-MTEB/EcomRetrieval | ndcg_at_100 | 67.238 |
C-MTEB/EcomRetrieval | ndcg_at_1000 | 67.74 |
C-MTEB/EcomRetrieval | ndcg_at_3 | 59.275 |
C-MTEB/EcomRetrieval | ndcg_at_5 | 61.805 |
C-MTEB/EcomRetrieval | precision_at_1 | 49.6 |
C-MTEB/EcomRetrieval | precision_at_10 | 8.12 |
C-MTEB/EcomRetrieval | precision_at_100 | 0.927 |
C-MTEB/EcomRetrieval | precision_at_1000 | 0.097 |
C-MTEB/EcomRetrieval | precision_at_3 | 22.067 |
C-MTEB/EcomRetrieval | precision_at_5 | 14.46 |
C-MTEB/EcomRetrieval | recall_at_1 | 49.6 |
C-MTEB/EcomRetrieval | recall_at_10 | 81.2 |
C-MTEB/EcomRetrieval | recall_at_100 | 92.7 |
C-MTEB/EcomRetrieval | recall_at_1000 | 96.6 |
C-MTEB/EcomRetrieval | recall_at_3 | 66.2 |
C-MTEB/EcomRetrieval | recall_at_5 | 72.3 |
C-MTEB/MMarcoRetrieval | map_at_1 | 66.83099999999999 |
C-MTEB/MMarcoRetrieval | map_at_10 | 75.945 |
C-MTEB/MMarcoRetrieval | map_at_100 | 76.259 |
C-MTEB/MMarcoRetrieval | map_at_1000 | 76.27000000000001 |
C-MTEB/MMarcoRetrieval | map_at_3 | 74.22999999999999 |
C-MTEB/MMarcoRetrieval | map_at_5 | 75.318 |
C-MTEB/MMarcoRetrieval | mrr_at_1 | 69.069 |
C-MTEB/MMarcoRetrieval | mrr_at_10 | 76.491 |
C-MTEB/MMarcoRetrieval | mrr_at_100 | 76.764 |
C-MTEB/MMarcoRetrieval | mrr_at_1000 | 76.775 |
C-MTEB/MMarcoRetrieval | mrr_at_3 | 75.01 |
C-MTEB/MMarcoRetrieval | mrr_at_5 | 75.934 |
C-MTEB/MMarcoRetrieval | ndcg_at_1 | 69.069 |
C-MTEB/MMarcoRetrieval | ndcg_at_10 | 79.557 |
C-MTEB/MMarcoRetrieval | ndcg_at_100 | 80.946 |
C-MTEB/MMarcoRetrieval | ndcg_at_1000 | 81.23700000000001 |
C-MTEB/MMarcoRetrieval | ndcg_at_3 | 76.31099999999999 |
C-MTEB/MMarcoRetrieval | ndcg_at_5 | 78.121 |
C-MTEB/MMarcoRetrieval | precision_at_1 | 69.069 |
C-MTEB/MMarcoRetrieval | precision_at_10 | 9.58 |
C-MTEB/MMarcoRetrieval | precision_at_100 | 1.027 |
C-MTEB/MMarcoRetrieval | precision_at_1000 | 0.105 |
C-MTEB/MMarcoRetrieval | precision_at_3 | 28.73 |
C-MTEB/MMarcoRetrieval | precision_at_5 | 18.201 |
C-MTEB/MMarcoRetrieval | recall_at_1 | 66.83099999999999 |
C-MTEB/MMarcoRetrieval | recall_at_10 | 90.118 |
C-MTEB/MMarcoRetrieval | recall_at_100 | 96.377 |
C-MTEB/MMarcoRetrieval | recall_at_1000 | 98.656 |
C-MTEB/MMarcoRetrieval | recall_at_3 | 81.516 |
C-MTEB/MMarcoRetrieval | recall_at_5 | 85.798 |
C-MTEB/MedicalRetrieval | map_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | map_at_10 | 57.715 |
C-MTEB/MedicalRetrieval | map_at_100 | 58.233999999999995 |
C-MTEB/MedicalRetrieval | map_at_1000 | 58.289 |
C-MTEB/MedicalRetrieval | map_at_3 | 56.483000000000004 |
C-MTEB/MedicalRetrieval | map_at_5 | 57.193000000000005 |
C-MTEB/MedicalRetrieval | mrr_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | mrr_at_10 | 57.714 |
C-MTEB/MedicalRetrieval | mrr_at_100 | 58.233000000000004 |
C-MTEB/MedicalRetrieval | mrr_at_1000 | 58.288 |
C-MTEB/MedicalRetrieval | mrr_at_3 | 56.483000000000004 |
C-MTEB/MedicalRetrieval | mrr_at_5 | 57.193000000000005 |
C-MTEB/MedicalRetrieval | ndcg_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | ndcg_at_10 | 60.63499999999999 |
C-MTEB/MedicalRetrieval | ndcg_at_100 | 63.458000000000006 |
C-MTEB/MedicalRetrieval | ndcg_at_1000 | 64.992 |
C-MTEB/MedicalRetrieval | ndcg_at_3 | 58.11300000000001 |
C-MTEB/MedicalRetrieval | ndcg_at_5 | 59.391000000000005 |
C-MTEB/MedicalRetrieval | precision_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | precision_at_10 | 6.97 |
C-MTEB/MedicalRetrieval | precision_at_100 | 0.836 |
C-MTEB/MedicalRetrieval | precision_at_1000 | 0.096 |
C-MTEB/MedicalRetrieval | precision_at_3 | 20.933 |
C-MTEB/MedicalRetrieval | precision_at_5 | 13.18 |
C-MTEB/MedicalRetrieval | recall_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | recall_at_10 | 69.69999999999999 |
C-MTEB/MedicalRetrieval | recall_at_100 | 83.6 |
C-MTEB/MedicalRetrieval | recall_at_1000 | 95.8 |
C-MTEB/MedicalRetrieval | recall_at_3 | 62.8 |
C-MTEB/MedicalRetrieval | recall_at_5 | 65.9 |
C-MTEB/T2Retrieval | map_at_1 | 27.423 |
C-MTEB/T2Retrieval | map_at_10 | 77.206 |
C-MTEB/T2Retrieval | map_at_100 | 80.83500000000001 |
C-MTEB/T2Retrieval | map_at_1000 | 80.9 |
C-MTEB/T2Retrieval | map_at_3 | 54.190000000000005 |
C-MTEB/T2Retrieval | map_at_5 | 66.662 |
C-MTEB/T2Retrieval | mrr_at_1 | 90.049 |
C-MTEB/T2Retrieval | mrr_at_10 | 92.48100000000001 |
C-MTEB/T2Retrieval | mrr_at_100 | 92.567 |
C-MTEB/T2Retrieval | mrr_at_1000 | 92.571 |
C-MTEB/T2Retrieval | mrr_at_3 | 92.07 |
C-MTEB/T2Retrieval | mrr_at_5 | 92.32900000000001 |
C-MTEB/T2Retrieval | ndcg_at_1 | 90.049 |
C-MTEB/T2Retrieval | ndcg_at_10 | 84.69 |
C-MTEB/T2Retrieval | ndcg_at_100 | 88.254 |
C-MTEB/T2Retrieval | ndcg_at_1000 | 88.89399999999999 |
C-MTEB/T2Retrieval | ndcg_at_3 | 86.091 |
C-MTEB/T2Retrieval | ndcg_at_5 | 84.685 |
C-MTEB/T2Retrieval | precision_at_1 | 90.049 |
C-MTEB/T2Retrieval | precision_at_10 | 42.141 |
C-MTEB/T2Retrieval | precision_at_100 | 5.016 |
C-MTEB/T2Retrieval | precision_at_1000 | 0.516 |
C-MTEB/T2Retrieval | precision_at_3 | 75.352 |
C-MTEB/T2Retrieval | precision_at_5 | 63.176 |
C-MTEB/T2Retrieval | recall_at_1 | 27.423 |
C-MTEB/T2Retrieval | recall_at_10 | 83.595 |
C-MTEB/T2Retrieval | recall_at_100 | 95.21 |
C-MTEB/T2Retrieval | recall_at_1000 | 98.503 |
C-MTEB/T2Retrieval | recall_at_3 | 55.84400000000001 |
C-MTEB/T2Retrieval | recall_at_5 | 69.987 |
C-MTEB/VideoRetrieval | map_at_1 | 56.99999999999999 |
C-MTEB/VideoRetrieval | map_at_10 | 67.611 |
C-MTEB/VideoRetrieval | map_at_100 | 68.095 |
C-MTEB/VideoRetrieval | map_at_1000 | 68.10300000000001 |
C-MTEB/VideoRetrieval | map_at_3 | 65.75 |
C-MTEB/VideoRetrieval | map_at_5 | 66.93 |
C-MTEB/VideoRetrieval | mrr_at_1 | 56.89999999999999 |
C-MTEB/VideoRetrieval | mrr_at_10 | 67.561 |
C-MTEB/VideoRetrieval | mrr_at_100 | 68.045 |
C-MTEB/VideoRetrieval | mrr_at_1000 | 68.053 |
C-MTEB/VideoRetrieval | mrr_at_3 | 65.7 |
C-MTEB/VideoRetrieval | mrr_at_5 | 66.88 |
C-MTEB/VideoRetrieval | ndcg_at_1 | 56.99999999999999 |
C-MTEB/VideoRetrieval | ndcg_at_10 | 72.25200000000001 |
C-MTEB/VideoRetrieval | ndcg_at_100 | 74.542 |
C-MTEB/VideoRetrieval | ndcg_at_1000 | 74.725 |
C-MTEB/VideoRetrieval | ndcg_at_3 | 68.47 |
C-MTEB/VideoRetrieval | ndcg_at_5 | 70.583 |
C-MTEB/VideoRetrieval | precision_at_1 | 56.99999999999999 |
C-MTEB/VideoRetrieval | precision_at_10 | 8.66 |
C-MTEB/VideoRetrieval | precision_at_100 | 0.972 |
C-MTEB/VideoRetrieval | precision_at_1000 | 0.099 |
C-MTEB/VideoRetrieval | precision_at_3 | 25.433 |
C-MTEB/VideoRetrieval | precision_at_5 | 16.28 |
C-MTEB/VideoRetrieval | recall_at_1 | 56.99999999999999 |
C-MTEB/VideoRetrieval | recall_at_10 | 86.6 |
C-MTEB/VideoRetrieval | recall_at_100 | 97.2 |
C-MTEB/VideoRetrieval | recall_at_1000 | 98.6 |
C-MTEB/VideoRetrieval | recall_at_3 | 76.3 |
C-MTEB/VideoRetrieval | recall_at_5 | 81.39999999999999 |
成對分類任務
數據集 | 指標類型 | 數值 |
---|---|---|
C-MTEB/CMNLI | cos_sim_accuracy | 77.71497294046902 |
C-MTEB/CMNLI | cos_sim_ap | 86.84526989595028 |
C-MTEB/CMNLI | cos_sim_f1 | 79.31987247608926 |
C-MTEB/CMNLI | cos_sim_precision | 72.70601987142022 |
C-MTEB/CMNLI | cos_sim_recall | 87.2574234276362 |
C-MTEB/CMNLI | dot_accuracy | 77.71497294046902 |
C-MTEB/CMNLI | dot_ap | 86.83880734247957 |
C-MTEB/CMNLI | dot_f1 | 79.31987247608926 |
C-MTEB/CMNLI | dot_precision | 72.70601987142022 |
C-MTEB/CMNLI | dot_recall | 87.2574234276362 |
C-MTEB/CMNLI | euclidean_accuracy | 77.71497294046902 |
C-MTEB/CMNLI | euclidean_ap | 86.84526869685902 |
C-MTEB/CMNLI | euclidean_f1 | 79.31987247608926 |
C-MTEB/CMNLI | euclidean_precision | 72.70601987142022 |
C-MTEB/CMNLI | euclidean_recall | 87.2574234276362 |
C-MTEB/CMNLI | manhattan_accuracy | 77.8111846061335 |
C-MTEB/CMNLI | manhattan_ap | 86.81142881585656 |
C-MTEB/CMNLI | manhattan_f1 | 79.4201671780764 |
C-MTEB/CMNLI | manhattan_precision | 72.53575570158485 |
C-MTEB/CMNLI | manhattan_recall | 87.74842179097499 |
C-MTEB/CMNLI | max_accuracy | 77.8111846061335 |
C-MTEB/CMNLI | max_ap | 86.84526989595028 |
C-MTEB/CMNLI | max_f1 | 79.4201671780764 |
C-MTEB/OCNLI | cos_sim_accuracy | 73.36220898754738 |
C-MTEB/OCNLI | cos_sim_ap | 78.50045169678386 |
C-MTEB/OCNLI | cos_sim_f1 | 75.3875968992248 |
C-MTEB/OCNLI | cos_sim_precision | 69.65085049239033 |
C-MTEB/OCNLI | cos_sim_recall | 82.15417106652588 |
C-MTEB/OCNLI | dot_accuracy | 73.36220898754738 |
C-MTEB/OCNLI | dot_ap | 78.50039148302838 |
C-MTEB/OCNLI | dot_f1 | 75.3875968992248 |
C-MTEB/OCNLI | dot_precision | 69.65085049239033 |
C-MTEB/OCNLI | dot_recall | 82.15417106652588 |
C-MTEB/OCNLI | euclidean_accuracy | 73.36220898754738 |
C-MTEB/OCNLI | euclidean_ap | 78.50045169678386 |
C-MTEB/OCNLI | euclidean_f1 | 75.3875968992248 |
C-MTEB/OCNLI | euclidean_precision | 69.65085049239033 |
C-MTEB/OCNLI | euclidean_recall | 82.15417106652588 |
C-MTEB/OCNLI | manhattan_accuracy | 73.09149972929075 |
C-MTEB/OCNLI | manhattan_ap | 78.40911589236852 |
C-MTEB/OCNLI | manhattan_f1 | 75.3623188405797 |
C-MTEB/OCNLI | manhattan_precision | 69.45681211041853 |
C-MTEB/OCNLI | manhattan_recall | 82.36536430834214 |
C-MTEB/OCNLI | max_accuracy | 73.36220898754738 |
C-MTEB/OCNLI | max_ap | 78.50045169678386 |
C-MTEB/OCNLI | max_f1 | 75.3875968992248 |
🔧 技術細節
模型方法
本模型的方法與stella-v2
相同,但在tao
上擴展了上下文長度。
上下文長度使用注意事項
如果要使用完整的8k上下文,可能需要將模型轉換為float32
。
📄 許可證
本模型採用Apache-2.0
許可證。
> ⚠️ 重要提示
若要使用完整的8k上下文,可能需要將模型轉換為float32
。
> 💡 使用建議
持續關注tao-v2
的發佈,它可能會帶來更好的性能和新的特性。
Jina Embeddings V3
Jina Embeddings V3 是一個多語言句子嵌入模型,支持超過100種語言,專注於句子相似度和特徵提取任務。
文本嵌入
Transformers 支持多種語言

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
基於MS Marco段落排序任務訓練的交叉編碼器模型,用於信息檢索中的查詢-段落相關性評分
文本嵌入 英語
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
基於蒸餾技術的稀疏檢索模型,專為OpenSearch優化,支持免推理文檔編碼,在搜索相關性和效率上優於V1版本
文本嵌入
Transformers 英語

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
基於PubMedBERT的生物醫學實體表徵模型,通過自對齊預訓練優化語義關係捕捉
文本嵌入 英語
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large 是一個強大的句子轉換器模型,專注於句子相似度和文本嵌入任務,在多個基準測試中表現出色。
文本嵌入 英語
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 是一個英文句子轉換器模型,專注於句子相似度任務,在多個文本嵌入基準測試中表現優異。
文本嵌入
Transformers 支持多種語言

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base 是一個多語言的句子嵌入模型,支持超過50種語言,適用於句子相似度計算等任務。
文本嵌入
Transformers 支持多種語言

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT是一個化學語言模型,旨在實現完全由機器驅動的超快聚合物信息學。它將PSMILES字符串映射為600維密集指紋,以數值形式表示聚合物化學結構。
文本嵌入
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
基於土耳其語BERT的句子嵌入模型,專為語義相似度任務優化
文本嵌入
Transformers 其他

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
基於BAAI/bge-small-en-v1.5模型微調的文本嵌入模型,通過MEDI數據集與MTEB分類任務數據集訓練,優化了檢索任務的查詢編碼能力。
文本嵌入
Safetensors 英語
G
avsolatorio
945.68k
29
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98