Xiaobu Embedding
X
Xiaobu Embedding
由 lier007 开发
xiaobu-embedding是一个多任务嵌入模型,支持多种中文自然语言处理任务,包括文本相似度计算、分类、聚类和检索等。
下载量 147
发布时间 : 1/9/2024
模型简介
该模型专注于中文文本的嵌入表示学习,能够生成高质量的文本向量,适用于多种下游任务,如语义相似度计算、文本分类和信息检索等。
模型特点
多任务支持
支持多种自然语言处理任务,包括文本相似度计算、分类、聚类和检索等。
中文优化
专门针对中文文本进行优化,能够更好地捕捉中文语义特征。
高性能
在多个中文基准测试中表现出色,特别是在医疗领域相关任务上表现优异。
模型能力
文本嵌入
语义相似度计算
文本分类
文本聚类
信息检索
重排序
使用案例
医疗领域
医疗问答检索
用于医疗相关问题的检索和回答
在CMedQA检索任务上MAP@10达到37.604
医疗文档重排序
对医疗相关文档进行相关性重排序
在CMedQAv2重排序任务上MAP达到87.57
电子商务
商品评论分类
对商品评论进行情感和主题分类
在京东评论分类任务上准确率达到86.74%
商品检索
电子商务平台中的商品搜索功能
在EcomRetrieval任务上MAP@10达到63.14
通用NLP
文本相似度计算
计算两段文本的语义相似度
在STSB任务上余弦相似度皮尔逊达到79.75
文本分类
对文本进行多类别分类
在IFlyTek分类任务上准确率达到49.74%
🚀 xiaobu-embedding
xiaobu-embedding
模型基于 GTE 模型进行多任务微调。该模型使用了多种类型的数据进行训练,包括闲聊类 Query-Query、知识类 Query-Doc 以及 BGE 开源 Query-Doc,并且对正例数据进行了清洗,挖掘了中等难度的负例,数据累计达 6M,更注重数据质量。
🚀 快速开始
安装依赖
pip install -U sentence-transformers
相似度计算示例
from sentence_transformers import SentenceTransformer
sentences_1 = ["样例数据-1", "样例数据-2"]
sentences_2 = ["样例数据-3", "样例数据-4"]
model = SentenceTransformer('lier007/xiaobu-embedding')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
✨ 主要特性
- 模型基础:基于 GTE 模型[1]进行多任务微调。
- 训练数据:使用闲聊类 Query-Query、知识类 Query-Doc、BGE 开源 Query-Doc[2] 作为训练数据;清洗正例,挖掘中等难度负例;累计 6M 数据,更注重数据质量。
📚 详细文档
评估
参考 BGE 中文 CMTEB 评估[2]。
微调
参考 BGE 微调模块[2]。
📄 评估指标
任务类型 | 数据集 | 指标 | 值 |
---|---|---|---|
STS | C-MTEB/AFQMC | cos_sim_pearson | 49.37874132528482 |
STS | C-MTEB/AFQMC | cos_sim_spearman | 54.84722470052176 |
STS | C-MTEB/AFQMC | euclidean_pearson | 53.0495882931575 |
STS | C-MTEB/AFQMC | euclidean_spearman | 54.847727301700665 |
STS | C-MTEB/AFQMC | manhattan_pearson | 53.0632140838278 |
STS | C-MTEB/AFQMC | manhattan_spearman | 54.8744258024692 |
STS | C-MTEB/ATEC | cos_sim_pearson | 48.15992903013723 |
STS | C-MTEB/ATEC | cos_sim_spearman | 55.13198035464577 |
STS | C-MTEB/ATEC | euclidean_pearson | 55.435876753245715 |
STS | C-MTEB/ATEC | euclidean_spearman | 55.13215936702871 |
STS | C-MTEB/ATEC | manhattan_pearson | 55.41429518223402 |
STS | C-MTEB/ATEC | manhattan_spearman | 55.13363087679285 |
Classification | mteb/amazon_reviews_multi | accuracy | 46.722 |
Classification | mteb/amazon_reviews_multi | f1 | 45.039340641893205 |
STS | C-MTEB/BQ | cos_sim_pearson | 63.517830355554224 |
STS | C-MTEB/BQ | cos_sim_spearman | 65.57007801018649 |
STS | C-MTEB/BQ | euclidean_pearson | 64.05153340906585 |
STS | C-MTEB/BQ | euclidean_spearman | 65.5696865661119 |
STS | C-MTEB/BQ | manhattan_pearson | 63.95710619755406 |
STS | C-MTEB/BQ | manhattan_spearman | 65.48565785379489 |
Clustering | C-MTEB/CLSClusteringP2P | v_measure | 43.24046498507819 |
Clustering | C-MTEB/CLSClusteringS2S | v_measure | 41.22618199372116 |
Reranking | C-MTEB/CMedQAv1-reranking | map | 87.12213224673621 |
Reranking | C-MTEB/CMedQAv1-reranking | mrr | 89.57150793650794 |
Reranking | C-MTEB/CMedQAv2-reranking | map | 87.57290061886421 |
Reranking | C-MTEB/CMedQAv2-reranking | mrr | 90.19202380952382 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_1 | 25.22 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_10 | 37.604 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_100 | 39.501 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_1000 | 39.614 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_3 | 33.378 |
Retrieval | C-MTEB/CmedqaRetrieval | map_at_5 | 35.774 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_1 | 38.385000000000005 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_10 | 46.487 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_100 | 47.504999999999995 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_1000 | 47.548 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_3 | 43.885999999999996 |
Retrieval | C-MTEB/CmedqaRetrieval | mrr_at_5 | 45.373000000000005 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_1 | 38.385000000000005 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_10 | 44.224999999999994 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_100 | 51.637 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_1000 | 53.55799999999999 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_3 | 38.845 |
Retrieval | C-MTEB/CmedqaRetrieval | ndcg_at_5 | 41.163 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_1 | 38.385000000000005 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_10 | 9.812 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_100 | 1.58 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_1000 | 0.183 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_3 | 21.88 |
Retrieval | C-MTEB/CmedqaRetrieval | precision_at_5 | 15.974 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_1 | 25.22 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_10 | 54.897 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_100 | 85.469 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_1000 | 98.18599999999999 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_3 | 38.815 |
Retrieval | C-MTEB/CmedqaRetrieval | recall_at_5 | 45.885 |
PairClassification | C-MTEB/CMNLI | cos_sim_accuracy | 83.22309079975948 |
PairClassification | C-MTEB/CMNLI | cos_sim_ap | 89.94833400328307 |
PairClassification | C-MTEB/CMNLI | cos_sim_f1 | 84.39319055464031 |
PairClassification | C-MTEB/CMNLI | cos_sim_precision | 79.5774647887324 |
PairClassification | C-MTEB/CMNLI | cos_sim_recall | 89.82931961655366 |
PairClassification | C-MTEB/CMNLI | dot_accuracy | 83.22309079975948 |
PairClassification | C-MTEB/CMNLI | dot_ap | 89.95618559578415 |
PairClassification | C-MTEB/CMNLI | dot_f1 | 84.41173239591345 |
PairClassification | C-MTEB/CMNLI | dot_precision | 79.61044343141317 |
PairClassification | C-MTEB/CMNLI | dot_recall | 89.82931961655366 |
PairClassification | C-MTEB/CMNLI | euclidean_accuracy | 83.23511725796753 |
PairClassification | C-MTEB/CMNLI | euclidean_ap | 89.94836342787318 |
PairClassification | C-MTEB/CMNLI | euclidean_f1 | 84.40550133096718 |
PairClassification | C-MTEB/CMNLI | euclidean_precision | 80.29120067524794 |
PairClassification | C-MTEB/CMNLI | euclidean_recall | 88.9642272620996 |
PairClassification | C-MTEB/CMNLI | manhattan_accuracy | 83.23511725796753 |
PairClassification | C-MTEB/CMNLI | manhattan_ap | 89.9450103956978 |
PairClassification | C-MTEB/CMNLI | manhattan_f1 | 84.44444444444444 |
PairClassification | C-MTEB/CMNLI | manhattan_precision | 80.09647651006712 |
PairClassification | C-MTEB/CMNLI | manhattan_recall | 89.29155950432546 |
PairClassification | C-MTEB/CMNLI | max_accuracy | 83.23511725796753 |
PairClassification | C-MTEB/CMNLI | max_ap | 89.95618559578415 |
PairClassification | C-MTEB/CMNLI | max_f1 | 84.44444444444444 |
Retrieval | C-MTEB/CovidRetrieval | map_at_1 | 76.87 |
Retrieval | C-MTEB/CovidRetrieval | map_at_10 | 84.502 |
Retrieval | C-MTEB/CovidRetrieval | map_at_100 | 84.615 |
Retrieval | C-MTEB/CovidRetrieval | map_at_1000 | 84.617 |
Retrieval | C-MTEB/CovidRetrieval | map_at_3 | 83.127 |
Retrieval | C-MTEB/CovidRetrieval | map_at_5 | 83.99600000000001 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_1 | 77.02799999999999 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_10 | 84.487 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_100 | 84.59299999999999 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_1000 | 84.59400000000001 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_3 | 83.193 |
Retrieval | C-MTEB/CovidRetrieval | mrr_at_5 | 83.994 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_1 | 77.134 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_10 | 87.68599999999999 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_100 | 88.17099999999999 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_1000 | 88.21 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_3 | 84.993 |
Retrieval | C-MTEB/CovidRetrieval | ndcg_at_5 | 86.519 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_1 | 77.134 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_10 | 9.841999999999999 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_100 | 1.006 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_1000 | 0.101 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_3 | 30.313000000000002 |
Retrieval | C-MTEB/CovidRetrieval | precision_at_5 | 18.945999999999998 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_1 | 76.87 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_10 | 97.418 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_100 | 99.579 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_1000 | 99.895 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_3 | 90.227 |
Retrieval | C-MTEB/CovidRetrieval | recall_at_5 | 93.888 |
Retrieval | C-MTEB/DuRetrieval | map_at_1 | 25.941 |
Retrieval | C-MTEB/DuRetrieval | map_at_10 | 78.793 |
Retrieval | C-MTEB/DuRetrieval | map_at_100 | 81.57799999999999 |
Retrieval | C-MTEB/DuRetrieval | map_at_1000 | 81.626 |
Retrieval | C-MTEB/DuRetrieval | map_at_3 | 54.749 |
Retrieval | C-MTEB/DuRetrieval | map_at_5 | 69.16 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_1 | 90.45 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_10 | 93.406 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_100 | 93.453 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_1000 | 93.45700000000001 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_3 | 93.10000000000001 |
Retrieval | C-MTEB/DuRetrieval | mrr_at_5 | 93.27499999999999 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_1 | 90.45 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_10 | 86.44500000000001 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_100 | 89.28399999999999 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_1000 | 89.739 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_3 | 85.62100000000001 |
Retrieval | C-MTEB/DuRetrieval | ndcg_at_5 | 84.441 |
Retrieval | C-MTEB/DuRetrieval | precision_at_1 | 90.45 |
Retrieval | C-MTEB/DuRetrieval | precision_at_10 | 41.19 |
Retrieval | C-MTEB/DuRetrieval | precision_at_100 | 4.761 |
Retrieval | C-MTEB/DuRetrieval | precision_at_1000 | 0.48700000000000004 |
Retrieval | C-MTEB/DuRetrieval | precision_at_3 | 76.583 |
Retrieval | C-MTEB/DuRetrieval | precision_at_5 | 64.68 |
Retrieval | C-MTEB/DuRetrieval | recall_at_1 | 25.941 |
Retrieval | C-MTEB/DuRetrieval | recall_at_10 | 87.443 |
Retrieval | C-MTEB/DuRetrieval | recall_at_100 | 96.54 |
Retrieval | C-MTEB/DuRetrieval | recall_at_1000 | 98.906 |
Retrieval | C-MTEB/DuRetrieval | recall_at_3 | 56.947 |
Retrieval | C-MTEB/DuRetrieval | recall_at_5 | 73.714 |
Retrieval | C-MTEB/EcomRetrieval | map_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | map_at_10 | 63.144 |
Retrieval | C-MTEB/EcomRetrieval | map_at_100 | 63.634 |
Retrieval | C-MTEB/EcomRetrieval | map_at_1000 | 63.644999999999996 |
Retrieval | C-MTEB/EcomRetrieval | map_at_3 | 60.817 |
Retrieval | C-MTEB/EcomRetrieval | map_at_5 | 62.202 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_10 | 63.144 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_100 | 63.634 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_1000 | 63.644999999999996 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_3 | 60.817 |
Retrieval | C-MTEB/EcomRetrieval | mrr_at_5 | 62.202 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_10 | 68.042 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_100 | 70.417 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_1000 | 70.722 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_3 | 63.287000000000006 |
Retrieval | C-MTEB/EcomRetrieval | ndcg_at_5 | 65.77 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_10 | 8.34 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_100 | 0.9450000000000001 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_1000 | 0.097 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_3 | 23.467 |
Retrieval | C-MTEB/EcomRetrieval | precision_at_5 | 15.28 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_1 | 52.900000000000006 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_10 | 83.39999999999999 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_100 | 94.5 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_1000 | 96.89999999999999 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_3 | 70.39999999999999 |
Retrieval | C-MTEB/EcomRetrieval | recall_at_5 | 76.4 |
Classification | C-MTEB/IFlyTek-classification | accuracy | 49.74220854174683 |
Classification | C-MTEB/IFlyTek-classification | f1 | 38.01399980618159 |
Classification | C-MTEB/JDReview-classification | accuracy | 86.73545966228893 |
Classification | C-MTEB/JDReview-classification | ap | 55.72394235169542 |
Classification | C-MTEB/JDReview-classification | f1 | 81.58550390953492 |
STS | C-MTEB/LCQMC | cos_sim_pearson | 69.96711977441642 |
STS | C-MTEB/LCQMC | cos_sim_spearman | 75.54747609685569 |
STS | C-MTEB/LCQMC | euclidean_pearson | 74.62663478056035 |
STS | C-MTEB/LCQMC | euclidean_spearman | 75.54761576699639 |
STS | C-MTEB/LCQMC | manhattan_pearson | 74.60983904582241 |
STS | C-MTEB/LCQMC | manhattan_spearman | 75.52758938061503 |
Reranking | C-MTEB/Mmarco-reranking | map | 28.076927649720986 |
Reranking | C-MTEB/Mmarco-reranking | mrr | 26.98015873015873 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_1 | 65.58 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_10 | 74.763 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_100 | 75.077 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_1000 | 75.091 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_3 | 72.982 |
Retrieval | C-MTEB/MMarcoRetrieval | map_at_5 | 74.155 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_1 | 67.822 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_10 | 75.437 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_100 | 75.702 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_1000 | 75.715 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_3 | 73.91799999999999 |
Retrieval | C-MTEB/MMarcoRetrieval | mrr_at_5 | 74.909 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_1 | 67.822 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_10 | 78.472 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_100 | 79.891 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_1000 | 80.262 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_3 | 75.138 |
Retrieval | C-MTEB/MMarcoRetrieval | ndcg_at_5 | 77.094 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_1 | 67.822 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_10 | 9.474 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_100 | 1.019 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_1000 | 0.105 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_3 | 28.281 |
Retrieval | C-MTEB/MMarcoRetrieval | precision_at_5 | 18.017 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_1 | 65.58 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_10 | 89.18599999999999 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_100 | 95.64399999999999 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_1000 | 98.541 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_3 | 80.455 |
Retrieval | C-MTEB/MMarcoRetrieval | recall_at_5 | 85.063 |
Classification | mteb/amazon_massive_intent | accuracy | 72.86819098856758 |
Classification | mteb/amazon_massive_intent | f1 | 70.25369778283451 |
Classification | mteb/amazon_massive_scenario | accuracy | 75.46738399462004 |
Classification | mteb/amazon_massive_scenario | f1 | 75.02466838130249 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_1 | 53.300000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_10 | 60.072 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_100 | 60.618 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_1000 | 60.659 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_3 | 58.550000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | map_at_5 | 59.425 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_1 | 53.5 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_10 | 60.187999999999995 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_100 | 60.73499999999999 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_1000 | 60.775999999999996 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_3 | 58.667 |
Retrieval | C-MTEB/MedicalRetrieval | mrr_at_5 | 59.541999999999994 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_1 | 53.300000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_10 | 63.376999999999995 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_100 | 66.24600000000001 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_1000 | 67.408 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_3 | 60.211000000000006 |
Retrieval | C-MTEB/MedicalRetrieval | ndcg_at_5 | 61.781 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_1 | 53.300000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_10 | 7.380000000000001 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_100 | 0.877 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_1000 | 0.097 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_3 | 21.667 |
Retrieval | C-MTEB/MedicalRetrieval | precision_at_5 | 13.76 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_1 | 53.300000000000004 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_10 | 73.8 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_100 | 87.7 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_1000 | 97.0 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_3 | 65.0 |
Retrieval | C-MTEB/MedicalRetrieval | recall_at_5 | 68.8 |
Classification | C-MTEB/MultilingualSentiment-classification | accuracy | 76.27666666666667 |
Classification | C-MTEB/MultilingualSentiment-classification | f1 | 76.31280038435165 |
PairClassification | C-MTEB/OCNLI | cos_sim_accuracy | 78.72225230102869 |
PairClassification | C-MTEB/OCNLI | cos_sim_ap | 80.63941899467723 |
PairClassification | C-MTEB/OCNLI | cos_sim_f1 | 80.52190121155638 |
PairClassification | C-MTEB/OCNLI | cos_sim_precision | 72.06005004170142 |
PairClassification | C-MTEB/OCNLI | cos_sim_recall | 91.23548046462513 |
PairClassification | C-MTEB/OCNLI | dot_accuracy | 78.72225230102869 |
PairClassification | C-MTEB/OCNLI | dot_ap | 80.63913939812744 |
PairClassification | C-MTEB/OCNLI | dot_f1 | 80.51948051948052 |
PairClassification | C-MTEB/OCNLI | dot_precision | 71.7948717948718 |
PairClassification | C-MTEB/OCNLI | dot_recall | 91.65786694825766 |
PairClassification | C-MTEB/OCNLI | euclidean_accuracy | 78.72225230102869 |
PairClassification | C-MTEB/OCNLI | euclidean_ap | 80.64403797436798 |
PairClassification | C-MTEB/OCNLI | euclidean_f1 | 80.52190121155638 |
PairClassification | C-MTEB/OCNLI | euclidean_precision | 72.06005004170142 |
PairClassification | C-MTEB/OCNLI | euclidean_recall | 91.23548046462513 |
PairClassification | C-MTEB/OCNLI | manhattan_accuracy | 78.18083378451544 |
PairClassification | C-MTEB/OCNLI | manhattan_ap | 80.5241189302444 |
PairClassification | C-MTEB/OCNLI | manhattan_f1 | 80.43478260869566 |
PairClassification | C-MTEB/OCNLI | manhattan_precision | 72.7972626176219 |
PairClassification | C-MTEB/OCNLI | manhattan_recall | 89.86272439281943 |
PairClassification | C-MTEB/OCNLI | max_accuracy | 78.72225230102869 |
PairClassification | C-MTEB/OCNLI | max_ap | 80.64403797436798 |
PairClassification | C-MTEB/OCNLI | max_f1 | 80.52190121155638 |
Classification | C-MTEB/OnlineShopping-classification | accuracy | 92.49000000000001 |
Classification | C-MTEB/OnlineShopping-classification | ap | 90.66330807324402 |
Classification | C-MTEB/OnlineShopping-classification | f1 | 92.48245049107115 |
STS | C-MTEB/PAWSX | cos_sim_pearson | 33.6275431596535 |
STS | C-MTEB/PAWSX | cos_sim_spearman | 37.865700050451494 |
STS | C-MTEB/PAWSX | euclidean_pearson | 38.1050665279388 |
STS | C-MTEB/PAWSX | euclidean_spearman | 37.864125056066364 |
STS | C-MTEB/PAWSX | manhattan_pearson | 38.11206873232881 |
STS | C-MTEB/PAWSX | manhattan_spearman | 37.852977098473936 |
STS | C-MTEB/QBQTC | cos_sim_pearson | 32.137955501231104 |
STS | C-MTEB/QBQTC | cos_sim_spearman | 33.68610910423116 |
STS | C-MTEB/QBQTC | euclidean_pearson | 32.155444753547926 |
STS | C-MTEB/QBQTC | euclidean_spearman | 33.685799252964124 |
STS | C-MTEB/QBQTC | manhattan_pearson | 32.14490855334317 |
STS | C-MTEB/QBQTC | manhattan_spearman | 33.656549820048554 |
STS | mteb/sts22-crosslingual-sts | cos_sim_pearson | 63.63884916818661 |
STS | mteb/sts22-crosslingual-sts | cos_sim_spearman | 64.3217581571435 |
STS | mteb/sts22-crosslingual-sts | euclidean_pearson | 63.475760085926055 |
STS | mteb/sts22-crosslingual-sts | euclidean_spearman | 64.31638169371887 |
STS | mteb/sts22-crosslingual-sts | manhattan_pearson | 64.39677572604752 |
STS | mteb/sts22-crosslingual-sts | manhattan_spearman | 64.85585019406021 |
STS | C-MTEB/STSB | cos_sim_pearson | 79.74698333415277 |
STS | C-MTEB/STSB | cos_sim_spearman | 81.1850043859317 |
STS | C-MTEB/STSB | euclidean_pearson | 80.94512578669881 |
STS | C-MTEB/STSB | euclidean_spearman | 81.18825478390181 |
STS | C-MTEB/STSB | manhattan_pearson | 80.88114336824758 |
STS | C-MTEB/STSB | manhattan_spearman | 81.12266715583868 |
Reranking | C-MTEB/T2Reranking | map | 66.59971552953814 |
Reranking | C-MTEB/T2Reranking | mrr | 76.42177408088038 |
Retrieval | C-MTEB/T2Retrieval | map_at_1 | 28.825 |
Retrieval | C-MTEB/T2Retrieval | map_at_10 | 77.48899999999999 |
Retrieval | C-MTEB/T2Retrieval | map_at_100 | 81.144 |
Retrieval | C-MTEB/T2Retrieval | map_at_1000 | 81.216 |
Retrieval | C-MTEB/T2Retrieval | map_at_3 | 55.435 |
Retrieval | C-MTEB/T2Retrieval | map_at_5 | 67.496 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_1 | 91.377 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_10 | 94.062 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_100 | 94.122 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_1000 | 94.123 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_3 | 93.709 |
Retrieval | C-MTEB/T2Retrieval | mrr_at_5 | 93.932 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_1 | 91.377 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_10 | 85.44800000000001 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_100 | 89.11099999999999 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_1000 | 89.752 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_3 | 87.262 |
Retrieval | C-MTEB/T2Retrieval | ndcg_at_5 | 85.668 |
Retrieval | C-MTEB/T2Retrieval | precision_at_1 | 91.377 |
Retrieval | C-MTEB/T2Retrieval | precision_at_10 | 41.525 |
Retrieval | C-MTEB/T2Retrieval | precision_at_100 | 4.989 |
Retrieval | C-MTEB/T2Retrieval | precision_at_1000 | 0.516 |
Retrieval | C-MTEB/T2Retrieval | precision_at_3 | 75.452 |
Retrieval | C-MTEB/T2Retrieval | precision_at_5 | 62.785000000000004 |
Retrieval | C-MTEB/T2Retrieval | recall_at_1 | 28.825 |
Retrieval | C-MTEB/T2Retrieval | recall_at_10 | 84.202 |
Retrieval | C-MTEB/T2Retrieval | recall_at_100 | 95.768 |
Retrieval | C-MTEB/T2Retrieval | recall_at_1000 | 98.791 |
Retrieval | C-MTEB/T2Retrieval | recall_at_3 | 57.284 |
Retrieval | C-MTEB/T2Retrieval | recall_at_5 | 71.071 |
Classification | C-MTEB/TNews-classification | accuracy | 52.160000000000004 |
Classification | C-MTEB/TNews-classification | f1 | 50.49492950548829 |
Clustering | C-MTEB/ThuNewsClusteringP2P | v_measure | 70.06019845009966 |
Clustering | C-MTEB/ThuNewsClusteringS2S | v_measure | 63.9370959228245 |
Retrieval | C-MTEB/VideoRetrieval | map_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | map_at_10 | 69.362 |
Retrieval | C-MTEB/VideoRetrieval | map_at_100 | 69.819 |
Retrieval | C-MTEB/VideoRetrieval | map_at_1000 | 69.833 |
Retrieval | C-MTEB/VideoRetrieval | map_at_3 | 67.783 |
Retrieval | C-MTEB/VideoRetrieval | map_at_5 | 68.71300000000001 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_10 | 69.362 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_100 | 69.819 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_1000 | 69.833 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_3 | 67.783 |
Retrieval | C-MTEB/VideoRetrieval | mrr_at_5 | 68.71300000000001 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_10 | 73.59400000000001 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_100 | 75.734 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_1000 | 76.049 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_3 | 70.33 |
Retrieval | C-MTEB/VideoRetrieval | ndcg_at_5 | 72.033 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_10 | 8.67 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_100 | 0.9650000000000001 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_1000 | 0.099 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_3 | 25.900000000000002 |
Retrieval | C-MTEB/VideoRetrieval | precision_at_5 | 16.38 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_1 | 60.0 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_10 | 86.7 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_100 | 96.5 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_1000 | 98.9 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_3 | 77.7 |
Retrieval | C-MTEB/VideoRetrieval | recall_at_5 | 81.89999999999999 |
Classification | C-MTEB/waimai-classification | accuracy | 88.36 |
Classification | C-MTEB/waimai-classification | ap | 73.25144216855439 |
Classification | C-MTEB/waimai-classification | f1 | 86.75076261442027 |
📄 参考资料
Jina Embeddings V3
Jina Embeddings V3 是一个多语言句子嵌入模型,支持超过100种语言,专注于句子相似度和特征提取任务。
文本嵌入
Transformers 支持多种语言

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
基于MS Marco段落排序任务训练的交叉编码器模型,用于信息检索中的查询-段落相关性评分
文本嵌入 英语
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
基于蒸馏技术的稀疏检索模型,专为OpenSearch优化,支持免推理文档编码,在搜索相关性和效率上优于V1版本
文本嵌入
Transformers 英语

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
基于PubMedBERT的生物医学实体表征模型,通过自对齐预训练优化语义关系捕捉
文本嵌入 英语
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large 是一个强大的句子转换器模型,专注于句子相似度和文本嵌入任务,在多个基准测试中表现出色。
文本嵌入 英语
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 是一个英文句子转换器模型,专注于句子相似度任务,在多个文本嵌入基准测试中表现优异。
文本嵌入
Transformers 支持多种语言

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base 是一个多语言的句子嵌入模型,支持超过50种语言,适用于句子相似度计算等任务。
文本嵌入
Transformers 支持多种语言

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT是一个化学语言模型,旨在实现完全由机器驱动的超快聚合物信息学。它将PSMILES字符串映射为600维密集指纹,以数值形式表示聚合物化学结构。
文本嵌入
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
基于土耳其语BERT的句子嵌入模型,专为语义相似度任务优化
文本嵌入
Transformers 其他

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
基于BAAI/bge-small-en-v1.5模型微调的文本嵌入模型,通过MEDI数据集与MTEB分类任务数据集训练,优化了检索任务的查询编码能力。
文本嵌入
Safetensors 英语
G
avsolatorio
945.68k
29
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98