Tao 8k
模型简介
该模型主要用于计算句子之间的相似度,支持余弦相似度、欧氏距离和曼哈顿距离等多种度量方式,适用于文本匹配、信息检索等场景。
模型特点
多度量相似度计算
支持余弦相似度、欧氏距离和曼哈顿距离等多种相似度度量方式。
中文优化
针对中文文本进行了优化,在多个中文评测数据集上表现优异。
广泛的任务支持
支持句子相似度、分类、聚类、重排序和检索等多种任务。
模型能力
句子相似度计算
文本分类
文本聚类
信息检索
重排序
使用案例
文本匹配
问答匹配
用于匹配问题和答案的相似度,提升问答系统的准确性。
在 CMedQAv1 和 CMedQAv2 数据集上平均准确率超过 85%。
语义相似度计算
计算两个句子的语义相似度,用于推荐系统或搜索排序。
在 AFQMC 和 ATEC 数据集上表现出色。
信息检索
文档检索
用于检索与查询最相关的文档。
在 CmedqaRetrieval 和 CovidRetrieval 数据集上表现良好。
🚀 tao-8k-origin模型
tao-8k-origin
是一个用于句子相似度计算的模型,它在多个自然语言处理任务和数据集上进行了评估,展现出了一定的性能表现。该模型在上下文长度上进行了扩展尝试,后续还计划推出结构不同的tao-v2
版本。
🚀 快速开始
模型用途
本模型主要用于句子相似度相关任务,如文本匹配、聚类、分类等。
模型特点
- 与
stella-v2
方法类似,但在tao
上扩展了上下文长度。 - 若要使用完整的8k上下文,可能需要将模型转换为
float32
。
后续计划
正在开发tao-v2
,它将具有不同的结构,并会尽快发布。
✨ 主要特性
- 上下文扩展:在
tao
基础上扩展了上下文长度,以处理更长的文本。 - 多任务支持:在多个自然语言处理任务上进行了测试,包括语义文本相似度(STS)、分类、聚类、重排序和检索等。
📚 详细文档
模型信息
属性 | 详情 |
---|---|
模型类型 | 句子相似度模型 |
训练数据 | 未提及 |
评估结果
以下是模型在不同数据集和任务上的评估指标:
语义文本相似度(STS)任务
数据集 | 指标类型 | 数值 |
---|---|---|
C-MTEB/AFQMC | cos_sim_pearson | 47.33644889578121 |
C-MTEB/AFQMC | cos_sim_spearman | 49.93968642502866 |
C-MTEB/AFQMC | euclidean_pearson | 48.12029792973887 |
C-MTEB/AFQMC | euclidean_spearman | 49.939666315145494 |
C-MTEB/AFQMC | manhattan_pearson | 48.07449594650583 |
C-MTEB/AFQMC | manhattan_spearman | 49.892461433911166 |
C-MTEB/ATEC | cos_sim_pearson | 50.976148098905746 |
C-MTEB/ATEC | cos_sim_spearman | 53.11230114448237 |
C-MTEB/ATEC | euclidean_pearson | 55.119977161851054 |
C-MTEB/ATEC | euclidean_spearman | 53.11229776647941 |
C-MTEB/ATEC | manhattan_pearson | 55.096968162828034 |
C-MTEB/ATEC | manhattan_spearman | 53.107481302419465 |
C-MTEB/BQ | cos_sim_pearson | 62.843816050026824 |
C-MTEB/BQ | cos_sim_spearman | 65.54142642656706 |
C-MTEB/BQ | euclidean_pearson | 64.08809634876388 |
C-MTEB/BQ | euclidean_spearman | 65.54142642558392 |
C-MTEB/BQ | manhattan_pearson | 64.09391522108272 |
C-MTEB/BQ | manhattan_spearman | 65.55445491162718 |
C-MTEB/LCQMC | cos_sim_pearson | 71.17874301231225 |
C-MTEB/LCQMC | cos_sim_spearman | 77.47936067899236 |
C-MTEB/LCQMC | euclidean_pearson | 76.3241109984839 |
C-MTEB/LCQMC | euclidean_spearman | 77.47936511149533 |
C-MTEB/LCQMC | manhattan_pearson | 76.3334642249198 |
C-MTEB/LCQMC | manhattan_spearman | 77.48889610190774 |
C-MTEB/PAWSX | cos_sim_pearson | 30.06960208048424 |
C-MTEB/PAWSX | cos_sim_spearman | 36.21568893707218 |
C-MTEB/PAWSX | euclidean_pearson | 36.3789158810154 |
C-MTEB/PAWSX | euclidean_spearman | 36.21568740241203 |
C-MTEB/PAWSX | manhattan_pearson | 36.318190228955935 |
C-MTEB/PAWSX | manhattan_spearman | 36.16813420759451 |
C-MTEB/QBQTC | cos_sim_pearson | 36.779942621488736 |
C-MTEB/QBQTC | cos_sim_spearman | 38.73716529566492 |
C-MTEB/QBQTC | euclidean_pearson | 37.134107612179605 |
C-MTEB/QBQTC | euclidean_spearman | 38.737099842399545 |
C-MTEB/QBQTC | manhattan_pearson | 37.17579625045808 |
C-MTEB/QBQTC | manhattan_spearman | 38.746051563332315 |
mteb/sts22-crosslingual-sts | cos_sim_pearson | 65.97416499132073 |
mteb/sts22-crosslingual-sts | cos_sim_spearman | 68.87894646940939 |
mteb/sts22-crosslingual-sts | euclidean_pearson | 67.2366929400408 |
mteb/sts22-crosslingual-sts | euclidean_spearman | 68.87894646940939 |
mteb/sts22-crosslingual-sts | manhattan_pearson | 67.30590304353478 |
mteb/sts22-crosslingual-sts | manhattan_spearman | 68.90546655032796 |
C-MTEB/STSB | cos_sim_pearson | 78.99420906581649 |
C-MTEB/STSB | cos_sim_spearman | 79.36553449000968 |
C-MTEB/STSB | euclidean_pearson | 78.77734144763518 |
C-MTEB/STSB | euclidean_spearman | 79.36545230850567 |
C-MTEB/STSB | manhattan_pearson | 78.82512507141092 |
C-MTEB/STSB | manhattan_spearman | 79.43977311125059 |
分类任务
数据集 | 指标类型 | 数值 |
---|---|---|
mteb/amazon_reviews_multi | accuracy | 40.804 |
mteb/amazon_reviews_multi | f1 | 39.01066543513968 |
C-MTEB/IFlyTek-classification | accuracy | 47.98768757214313 |
C-MTEB/IFlyTek-classification | f1 | 35.24243089488371 |
C-MTEB/JDReview-classification | accuracy | 87.01688555347093 |
C-MTEB/JDReview-classification | ap | 56.39167630414159 |
C-MTEB/JDReview-classification | f1 | 81.91756262306008 |
mteb/amazon_massive_intent | accuracy | 68.2649630127774 |
mteb/amazon_massive_intent | f1 | 65.96868218344183 |
mteb/amazon_massive_scenario | accuracy | 73.13382649630127 |
mteb/amazon_massive_scenario | f1 | 72.69980239148315 |
C-MTEB/MultilingualSentiment-classification | accuracy | 73.39 |
C-MTEB/MultilingualSentiment-classification | f1 | 72.85739851837214 |
C-MTEB/OnlineShopping-classification | accuracy | 91.81000000000002 |
C-MTEB/OnlineShopping-classification | ap | 89.35809579688139 |
C-MTEB/OnlineShopping-classification | f1 | 91.79220350456818 |
C-MTEB/TNews-classification | accuracy | 51.927 |
C-MTEB/TNews-classification | f1 | 50.16838216110367 |
C-MTEB/waimai-classification | accuracy | 87.10000000000001 |
C-MTEB/waimai-classification | ap | 70.81766065881429 |
C-MTEB/waimai-classification | f1 | 85.5323306120456 |
聚类任务
数据集 | 指标类型 | 数值 |
---|---|---|
C-MTEB/CLSClusteringP2P | v_measure | 40.028061591547804 |
C-MTEB/CLSClusteringS2S | v_measure | 38.1897102944254 |
C-MTEB/ThuNewsClusteringP2P | v_measure | 60.85131720842154 |
C-MTEB/ThuNewsClusteringS2S | v_measure | 57.0921610946628 |
重排序任务
数据集 | 指标类型 | 数值 |
---|---|---|
C-MTEB/CMedQAv1-reranking | map | 85.34294439514511 |
C-MTEB/CMedQAv1-reranking | mrr | 88.03849206349206 |
C-MTEB/CMedQAv2-reranking | map | 85.81294364673899 |
C-MTEB/CMedQAv2-reranking | mrr | 88.52146825396825 |
C-MTEB/Mmarco-reranking | map | 27.96872431410137 |
C-MTEB/Mmarco-reranking | mrr | 26.92023809523809 |
C-MTEB/T2Reranking | map | 66.38018284846501 |
C-MTEB/T2Reranking | mrr | 76.11180965277104 |
检索任务
数据集 | 指标类型 | 数值 |
---|---|---|
C-MTEB/CmedqaRetrieval | map_at_1 | 23.982 |
C-MTEB/CmedqaRetrieval | map_at_10 | 36.21 |
C-MTEB/CmedqaRetrieval | map_at_100 | 38.072 |
C-MTEB/CmedqaRetrieval | map_at_1000 | 38.194 |
C-MTEB/CmedqaRetrieval | map_at_3 | 32.239000000000004 |
C-MTEB/CmedqaRetrieval | map_at_5 | 34.377 |
C-MTEB/CmedqaRetrieval | mrr_at_1 | 36.858999999999995 |
C-MTEB/CmedqaRetrieval | mrr_at_10 | 45.084999999999994 |
C-MTEB/CmedqaRetrieval | mrr_at_100 | 46.104 |
C-MTEB/CmedqaRetrieval | mrr_at_1000 | 46.154 |
C-MTEB/CmedqaRetrieval | mrr_at_3 | 42.623 |
C-MTEB/CmedqaRetrieval | mrr_at_5 | 43.995 |
C-MTEB/CmedqaRetrieval | ndcg_at_1 | 36.858999999999995 |
C-MTEB/CmedqaRetrieval | ndcg_at_10 | 42.735 |
C-MTEB/CmedqaRetrieval | ndcg_at_100 | 50.181 |
C-MTEB/CmedqaRetrieval | ndcg_at_1000 | 52.309000000000005 |
C-MTEB/CmedqaRetrieval | ndcg_at_3 | 37.728 |
C-MTEB/CmedqaRetrieval | ndcg_at_5 | 39.664 |
C-MTEB/CmedqaRetrieval | precision_at_1 | 36.858999999999995 |
C-MTEB/CmedqaRetrieval | precision_at_10 | 9.615 |
C-MTEB/CmedqaRetrieval | precision_at_100 | 1.564 |
C-MTEB/CmedqaRetrieval | precision_at_1000 | 0.183 |
C-MTEB/CmedqaRetrieval | precision_at_3 | 21.514 |
C-MTEB/CmedqaRetrieval | precision_at_5 | 15.568999999999999 |
C-MTEB/CmedqaRetrieval | recall_at_1 | 23.982 |
C-MTEB/CmedqaRetrieval | recall_at_10 | 53.04600000000001 |
C-MTEB/CmedqaRetrieval | recall_at_100 | 84.113 |
C-MTEB/CmedqaRetrieval | recall_at_1000 | 98.37 |
C-MTEB/CmedqaRetrieval | recall_at_3 | 37.824999999999996 |
C-MTEB/CmedqaRetrieval | recall_at_5 | 44.023 |
C-MTEB/CovidRetrieval | map_at_1 | 70.706 |
C-MTEB/CovidRetrieval | map_at_10 | 78.619 |
C-MTEB/CovidRetrieval | map_at_100 | 78.915 |
C-MTEB/CovidRetrieval | map_at_1000 | 78.918 |
C-MTEB/CovidRetrieval | map_at_3 | 76.967 |
C-MTEB/CovidRetrieval | map_at_5 | 77.922 |
C-MTEB/CovidRetrieval | mrr_at_1 | 70.917 |
C-MTEB/CovidRetrieval | mrr_at_10 | 78.64 |
C-MTEB/CovidRetrieval | mrr_at_100 | 78.935 |
C-MTEB/CovidRetrieval | mrr_at_1000 | 78.938 |
C-MTEB/CovidRetrieval | mrr_at_3 | 77.081 |
C-MTEB/CovidRetrieval | mrr_at_5 | 77.972 |
C-MTEB/CovidRetrieval | ndcg_at_1 | 70.917 |
C-MTEB/CovidRetrieval | ndcg_at_10 | 82.186 |
C-MTEB/CovidRetrieval | ndcg_at_100 | 83.487 |
C-MTEB/CovidRetrieval | ndcg_at_1000 | 83.589 |
C-MTEB/CovidRetrieval | ndcg_at_3 | 78.874 |
C-MTEB/CovidRetrieval | ndcg_at_5 | 80.548 |
C-MTEB/CovidRetrieval | precision_at_1 | 70.917 |
C-MTEB/CovidRetrieval | precision_at_10 | 9.431000000000001 |
C-MTEB/CovidRetrieval | precision_at_100 | 1.001 |
C-MTEB/CovidRetrieval | precision_at_1000 | 0.101 |
C-MTEB/CovidRetrieval | precision_at_3 | 28.275 |
C-MTEB/CovidRetrieval | precision_at_5 | 17.829 |
C-MTEB/CovidRetrieval | recall_at_1 | 70.706 |
C-MTEB/CovidRetrieval | recall_at_10 | 93.256 |
C-MTEB/CovidRetrieval | recall_at_100 | 99.05199999999999 |
C-MTEB/CovidRetrieval | recall_at_1000 | 99.895 |
C-MTEB/CovidRetrieval | recall_at_3 | 84.247 |
C-MTEB/CovidRetrieval | recall_at_5 | 88.251 |
C-MTEB/DuRetrieval | map_at_1 | 25.989 |
C-MTEB/DuRetrieval | map_at_10 | 80.882 |
C-MTEB/DuRetrieval | map_at_100 | 83.63199999999999 |
C-MTEB/DuRetrieval | map_at_1000 | 83.663 |
C-MTEB/DuRetrieval | map_at_3 | 55.772 |
C-MTEB/DuRetrieval | map_at_5 | 70.598 |
C-MTEB/DuRetrieval | mrr_at_1 | 90.14999999999999 |
C-MTEB/DuRetrieval | mrr_at_10 | 93.30000000000001 |
C-MTEB/DuRetrieval | mrr_at_100 | 93.363 |
C-MTEB/DuRetrieval | mrr_at_1000 | 93.366 |
C-MTEB/DuRetrieval | mrr_at_3 | 93.083 |
C-MTEB/DuRetrieval | mrr_at_5 | 93.206 |
C-MTEB/DuRetrieval | ndcg_at_1 | 90.14999999999999 |
C-MTEB/DuRetrieval | ndcg_at_10 | 88.016 |
C-MTEB/DuRetrieval | ndcg_at_100 | 90.52900000000001 |
C-MTEB/DuRetrieval | ndcg_at_1000 | 90.84400000000001 |
C-MTEB/DuRetrieval | ndcg_at_3 | 86.529 |
C-MTEB/DuRetrieval | ndcg_at_5 | 85.65899999999999 |
C-MTEB/DuRetrieval | precision_at_1 | 90.14999999999999 |
C-MTEB/DuRetrieval | precision_at_10 | 42.295 |
C-MTEB/DuRetrieval | precision_at_100 | 4.826 |
C-MTEB/DuRetrieval | precision_at_1000 | 0.48900000000000005 |
C-MTEB/DuRetrieval | precision_at_3 | 77.717 |
C-MTEB/DuRetrieval | precision_at_5 | 65.81 |
C-MTEB/DuRetrieval | recall_at_1 | 25.989 |
C-MTEB/DuRetrieval | recall_at_10 | 89.446 |
C-MTEB/DuRetrieval | recall_at_100 | 97.832 |
C-MTEB/DuRetrieval | recall_at_1000 | 99.568 |
C-MTEB/DuRetrieval | recall_at_3 | 58.223 |
C-MTEB/DuRetrieval | recall_at_5 | 75.411 |
C-MTEB/EcomRetrieval | map_at_1 | 49.6 |
C-MTEB/EcomRetrieval | map_at_10 | 59.512 |
C-MTEB/EcomRetrieval | map_at_100 | 60.059 |
C-MTEB/EcomRetrieval | map_at_1000 | 60.077999999999996 |
C-MTEB/EcomRetrieval | map_at_3 | 56.882999999999996 |
C-MTEB/EcomRetrieval | map_at_5 | 58.298 |
C-MTEB/EcomRetrieval | mrr_at_1 | 49.6 |
C-MTEB/EcomRetrieval | mrr_at_10 | 59.512 |
C-MTEB/EcomRetrieval | mrr_at_100 | 60.059 |
C-MTEB/EcomRetrieval | mrr_at_1000 | 60.077999999999996 |
C-MTEB/EcomRetrieval | mrr_at_3 | 56.882999999999996 |
C-MTEB/EcomRetrieval | mrr_at_5 | 58.298 |
C-MTEB/EcomRetrieval | ndcg_at_1 | 49.6 |
C-MTEB/EcomRetrieval | ndcg_at_10 | 64.71000000000001 |
C-MTEB/EcomRetrieval | ndcg_at_100 | 67.238 |
C-MTEB/EcomRetrieval | ndcg_at_1000 | 67.74 |
C-MTEB/EcomRetrieval | ndcg_at_3 | 59.275 |
C-MTEB/EcomRetrieval | ndcg_at_5 | 61.805 |
C-MTEB/EcomRetrieval | precision_at_1 | 49.6 |
C-MTEB/EcomRetrieval | precision_at_10 | 8.12 |
C-MTEB/EcomRetrieval | precision_at_100 | 0.927 |
C-MTEB/EcomRetrieval | precision_at_1000 | 0.097 |
C-MTEB/EcomRetrieval | precision_at_3 | 22.067 |
C-MTEB/EcomRetrieval | precision_at_5 | 14.46 |
C-MTEB/EcomRetrieval | recall_at_1 | 49.6 |
C-MTEB/EcomRetrieval | recall_at_10 | 81.2 |
C-MTEB/EcomRetrieval | recall_at_100 | 92.7 |
C-MTEB/EcomRetrieval | recall_at_1000 | 96.6 |
C-MTEB/EcomRetrieval | recall_at_3 | 66.2 |
C-MTEB/EcomRetrieval | recall_at_5 | 72.3 |
C-MTEB/MMarcoRetrieval | map_at_1 | 66.83099999999999 |
C-MTEB/MMarcoRetrieval | map_at_10 | 75.945 |
C-MTEB/MMarcoRetrieval | map_at_100 | 76.259 |
C-MTEB/MMarcoRetrieval | map_at_1000 | 76.27000000000001 |
C-MTEB/MMarcoRetrieval | map_at_3 | 74.22999999999999 |
C-MTEB/MMarcoRetrieval | map_at_5 | 75.318 |
C-MTEB/MMarcoRetrieval | mrr_at_1 | 69.069 |
C-MTEB/MMarcoRetrieval | mrr_at_10 | 76.491 |
C-MTEB/MMarcoRetrieval | mrr_at_100 | 76.764 |
C-MTEB/MMarcoRetrieval | mrr_at_1000 | 76.775 |
C-MTEB/MMarcoRetrieval | mrr_at_3 | 75.01 |
C-MTEB/MMarcoRetrieval | mrr_at_5 | 75.934 |
C-MTEB/MMarcoRetrieval | ndcg_at_1 | 69.069 |
C-MTEB/MMarcoRetrieval | ndcg_at_10 | 79.557 |
C-MTEB/MMarcoRetrieval | ndcg_at_100 | 80.946 |
C-MTEB/MMarcoRetrieval | ndcg_at_1000 | 81.23700000000001 |
C-MTEB/MMarcoRetrieval | ndcg_at_3 | 76.31099999999999 |
C-MTEB/MMarcoRetrieval | ndcg_at_5 | 78.121 |
C-MTEB/MMarcoRetrieval | precision_at_1 | 69.069 |
C-MTEB/MMarcoRetrieval | precision_at_10 | 9.58 |
C-MTEB/MMarcoRetrieval | precision_at_100 | 1.027 |
C-MTEB/MMarcoRetrieval | precision_at_1000 | 0.105 |
C-MTEB/MMarcoRetrieval | precision_at_3 | 28.73 |
C-MTEB/MMarcoRetrieval | precision_at_5 | 18.201 |
C-MTEB/MMarcoRetrieval | recall_at_1 | 66.83099999999999 |
C-MTEB/MMarcoRetrieval | recall_at_10 | 90.118 |
C-MTEB/MMarcoRetrieval | recall_at_100 | 96.377 |
C-MTEB/MMarcoRetrieval | recall_at_1000 | 98.656 |
C-MTEB/MMarcoRetrieval | recall_at_3 | 81.516 |
C-MTEB/MMarcoRetrieval | recall_at_5 | 85.798 |
C-MTEB/MedicalRetrieval | map_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | map_at_10 | 57.715 |
C-MTEB/MedicalRetrieval | map_at_100 | 58.233999999999995 |
C-MTEB/MedicalRetrieval | map_at_1000 | 58.289 |
C-MTEB/MedicalRetrieval | map_at_3 | 56.483000000000004 |
C-MTEB/MedicalRetrieval | map_at_5 | 57.193000000000005 |
C-MTEB/MedicalRetrieval | mrr_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | mrr_at_10 | 57.714 |
C-MTEB/MedicalRetrieval | mrr_at_100 | 58.233000000000004 |
C-MTEB/MedicalRetrieval | mrr_at_1000 | 58.288 |
C-MTEB/MedicalRetrieval | mrr_at_3 | 56.483000000000004 |
C-MTEB/MedicalRetrieval | mrr_at_5 | 57.193000000000005 |
C-MTEB/MedicalRetrieval | ndcg_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | ndcg_at_10 | 60.63499999999999 |
C-MTEB/MedicalRetrieval | ndcg_at_100 | 63.458000000000006 |
C-MTEB/MedicalRetrieval | ndcg_at_1000 | 64.992 |
C-MTEB/MedicalRetrieval | ndcg_at_3 | 58.11300000000001 |
C-MTEB/MedicalRetrieval | ndcg_at_5 | 59.391000000000005 |
C-MTEB/MedicalRetrieval | precision_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | precision_at_10 | 6.97 |
C-MTEB/MedicalRetrieval | precision_at_100 | 0.836 |
C-MTEB/MedicalRetrieval | precision_at_1000 | 0.096 |
C-MTEB/MedicalRetrieval | precision_at_3 | 20.933 |
C-MTEB/MedicalRetrieval | precision_at_5 | 13.18 |
C-MTEB/MedicalRetrieval | recall_at_1 | 51.2 |
C-MTEB/MedicalRetrieval | recall_at_10 | 69.69999999999999 |
C-MTEB/MedicalRetrieval | recall_at_100 | 83.6 |
C-MTEB/MedicalRetrieval | recall_at_1000 | 95.8 |
C-MTEB/MedicalRetrieval | recall_at_3 | 62.8 |
C-MTEB/MedicalRetrieval | recall_at_5 | 65.9 |
C-MTEB/T2Retrieval | map_at_1 | 27.423 |
C-MTEB/T2Retrieval | map_at_10 | 77.206 |
C-MTEB/T2Retrieval | map_at_100 | 80.83500000000001 |
C-MTEB/T2Retrieval | map_at_1000 | 80.9 |
C-MTEB/T2Retrieval | map_at_3 | 54.190000000000005 |
C-MTEB/T2Retrieval | map_at_5 | 66.662 |
C-MTEB/T2Retrieval | mrr_at_1 | 90.049 |
C-MTEB/T2Retrieval | mrr_at_10 | 92.48100000000001 |
C-MTEB/T2Retrieval | mrr_at_100 | 92.567 |
C-MTEB/T2Retrieval | mrr_at_1000 | 92.571 |
C-MTEB/T2Retrieval | mrr_at_3 | 92.07 |
C-MTEB/T2Retrieval | mrr_at_5 | 92.32900000000001 |
C-MTEB/T2Retrieval | ndcg_at_1 | 90.049 |
C-MTEB/T2Retrieval | ndcg_at_10 | 84.69 |
C-MTEB/T2Retrieval | ndcg_at_100 | 88.254 |
C-MTEB/T2Retrieval | ndcg_at_1000 | 88.89399999999999 |
C-MTEB/T2Retrieval | ndcg_at_3 | 86.091 |
C-MTEB/T2Retrieval | ndcg_at_5 | 84.685 |
C-MTEB/T2Retrieval | precision_at_1 | 90.049 |
C-MTEB/T2Retrieval | precision_at_10 | 42.141 |
C-MTEB/T2Retrieval | precision_at_100 | 5.016 |
C-MTEB/T2Retrieval | precision_at_1000 | 0.516 |
C-MTEB/T2Retrieval | precision_at_3 | 75.352 |
C-MTEB/T2Retrieval | precision_at_5 | 63.176 |
C-MTEB/T2Retrieval | recall_at_1 | 27.423 |
C-MTEB/T2Retrieval | recall_at_10 | 83.595 |
C-MTEB/T2Retrieval | recall_at_100 | 95.21 |
C-MTEB/T2Retrieval | recall_at_1000 | 98.503 |
C-MTEB/T2Retrieval | recall_at_3 | 55.84400000000001 |
C-MTEB/T2Retrieval | recall_at_5 | 69.987 |
C-MTEB/VideoRetrieval | map_at_1 | 56.99999999999999 |
C-MTEB/VideoRetrieval | map_at_10 | 67.611 |
C-MTEB/VideoRetrieval | map_at_100 | 68.095 |
C-MTEB/VideoRetrieval | map_at_1000 | 68.10300000000001 |
C-MTEB/VideoRetrieval | map_at_3 | 65.75 |
C-MTEB/VideoRetrieval | map_at_5 | 66.93 |
C-MTEB/VideoRetrieval | mrr_at_1 | 56.89999999999999 |
C-MTEB/VideoRetrieval | mrr_at_10 | 67.561 |
C-MTEB/VideoRetrieval | mrr_at_100 | 68.045 |
C-MTEB/VideoRetrieval | mrr_at_1000 | 68.053 |
C-MTEB/VideoRetrieval | mrr_at_3 | 65.7 |
C-MTEB/VideoRetrieval | mrr_at_5 | 66.88 |
C-MTEB/VideoRetrieval | ndcg_at_1 | 56.99999999999999 |
C-MTEB/VideoRetrieval | ndcg_at_10 | 72.25200000000001 |
C-MTEB/VideoRetrieval | ndcg_at_100 | 74.542 |
C-MTEB/VideoRetrieval | ndcg_at_1000 | 74.725 |
C-MTEB/VideoRetrieval | ndcg_at_3 | 68.47 |
C-MTEB/VideoRetrieval | ndcg_at_5 | 70.583 |
C-MTEB/VideoRetrieval | precision_at_1 | 56.99999999999999 |
C-MTEB/VideoRetrieval | precision_at_10 | 8.66 |
C-MTEB/VideoRetrieval | precision_at_100 | 0.972 |
C-MTEB/VideoRetrieval | precision_at_1000 | 0.099 |
C-MTEB/VideoRetrieval | precision_at_3 | 25.433 |
C-MTEB/VideoRetrieval | precision_at_5 | 16.28 |
C-MTEB/VideoRetrieval | recall_at_1 | 56.99999999999999 |
C-MTEB/VideoRetrieval | recall_at_10 | 86.6 |
C-MTEB/VideoRetrieval | recall_at_100 | 97.2 |
C-MTEB/VideoRetrieval | recall_at_1000 | 98.6 |
C-MTEB/VideoRetrieval | recall_at_3 | 76.3 |
C-MTEB/VideoRetrieval | recall_at_5 | 81.39999999999999 |
成对分类任务
数据集 | 指标类型 | 数值 |
---|---|---|
C-MTEB/CMNLI | cos_sim_accuracy | 77.71497294046902 |
C-MTEB/CMNLI | cos_sim_ap | 86.84526989595028 |
C-MTEB/CMNLI | cos_sim_f1 | 79.31987247608926 |
C-MTEB/CMNLI | cos_sim_precision | 72.70601987142022 |
C-MTEB/CMNLI | cos_sim_recall | 87.2574234276362 |
C-MTEB/CMNLI | dot_accuracy | 77.71497294046902 |
C-MTEB/CMNLI | dot_ap | 86.83880734247957 |
C-MTEB/CMNLI | dot_f1 | 79.31987247608926 |
C-MTEB/CMNLI | dot_precision | 72.70601987142022 |
C-MTEB/CMNLI | dot_recall | 87.2574234276362 |
C-MTEB/CMNLI | euclidean_accuracy | 77.71497294046902 |
C-MTEB/CMNLI | euclidean_ap | 86.84526869685902 |
C-MTEB/CMNLI | euclidean_f1 | 79.31987247608926 |
C-MTEB/CMNLI | euclidean_precision | 72.70601987142022 |
C-MTEB/CMNLI | euclidean_recall | 87.2574234276362 |
C-MTEB/CMNLI | manhattan_accuracy | 77.8111846061335 |
C-MTEB/CMNLI | manhattan_ap | 86.81142881585656 |
C-MTEB/CMNLI | manhattan_f1 | 79.4201671780764 |
C-MTEB/CMNLI | manhattan_precision | 72.53575570158485 |
C-MTEB/CMNLI | manhattan_recall | 87.74842179097499 |
C-MTEB/CMNLI | max_accuracy | 77.8111846061335 |
C-MTEB/CMNLI | max_ap | 86.84526989595028 |
C-MTEB/CMNLI | max_f1 | 79.4201671780764 |
C-MTEB/OCNLI | cos_sim_accuracy | 73.36220898754738 |
C-MTEB/OCNLI | cos_sim_ap | 78.50045169678386 |
C-MTEB/OCNLI | cos_sim_f1 | 75.3875968992248 |
C-MTEB/OCNLI | cos_sim_precision | 69.65085049239033 |
C-MTEB/OCNLI | cos_sim_recall | 82.15417106652588 |
C-MTEB/OCNLI | dot_accuracy | 73.36220898754738 |
C-MTEB/OCNLI | dot_ap | 78.50039148302838 |
C-MTEB/OCNLI | dot_f1 | 75.3875968992248 |
C-MTEB/OCNLI | dot_precision | 69.65085049239033 |
C-MTEB/OCNLI | dot_recall | 82.15417106652588 |
C-MTEB/OCNLI | euclidean_accuracy | 73.36220898754738 |
C-MTEB/OCNLI | euclidean_ap | 78.50045169678386 |
C-MTEB/OCNLI | euclidean_f1 | 75.3875968992248 |
C-MTEB/OCNLI | euclidean_precision | 69.65085049239033 |
C-MTEB/OCNLI | euclidean_recall | 82.15417106652588 |
C-MTEB/OCNLI | manhattan_accuracy | 73.09149972929075 |
C-MTEB/OCNLI | manhattan_ap | 78.40911589236852 |
C-MTEB/OCNLI | manhattan_f1 | 75.3623188405797 |
C-MTEB/OCNLI | manhattan_precision | 69.45681211041853 |
C-MTEB/OCNLI | manhattan_recall | 82.36536430834214 |
C-MTEB/OCNLI | max_accuracy | 73.36220898754738 |
C-MTEB/OCNLI | max_ap | 78.50045169678386 |
C-MTEB/OCNLI | max_f1 | 75.3875968992248 |
🔧 技术细节
模型方法
本模型的方法与stella-v2
相同,但在tao
上扩展了上下文长度。
上下文长度使用注意事项
如果要使用完整的8k上下文,可能需要将模型转换为float32
。
📄 许可证
本模型采用Apache-2.0
许可证。
> ⚠️ 重要提示
若要使用完整的8k上下文,可能需要将模型转换为float32
。
> 💡 使用建议
持续关注tao-v2
的发布,它可能会带来更好的性能和新的特性。
Jina Embeddings V3
Jina Embeddings V3 是一个多语言句子嵌入模型,支持超过100种语言,专注于句子相似度和特征提取任务。
文本嵌入
Transformers 支持多种语言

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
基于MS Marco段落排序任务训练的交叉编码器模型,用于信息检索中的查询-段落相关性评分
文本嵌入 英语
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
基于蒸馏技术的稀疏检索模型,专为OpenSearch优化,支持免推理文档编码,在搜索相关性和效率上优于V1版本
文本嵌入
Transformers 英语

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
基于PubMedBERT的生物医学实体表征模型,通过自对齐预训练优化语义关系捕捉
文本嵌入 英语
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large 是一个强大的句子转换器模型,专注于句子相似度和文本嵌入任务,在多个基准测试中表现出色。
文本嵌入 英语
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 是一个英文句子转换器模型,专注于句子相似度任务,在多个文本嵌入基准测试中表现优异。
文本嵌入
Transformers 支持多种语言

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base 是一个多语言的句子嵌入模型,支持超过50种语言,适用于句子相似度计算等任务。
文本嵌入
Transformers 支持多种语言

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT是一个化学语言模型,旨在实现完全由机器驱动的超快聚合物信息学。它将PSMILES字符串映射为600维密集指纹,以数值形式表示聚合物化学结构。
文本嵌入
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
基于土耳其语BERT的句子嵌入模型,专为语义相似度任务优化
文本嵌入
Transformers 其他

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
基于BAAI/bge-small-en-v1.5模型微调的文本嵌入模型,通过MEDI数据集与MTEB分类任务数据集训练,优化了检索任务的查询编码能力。
文本嵌入
Safetensors 英语
G
avsolatorio
945.68k
29
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98