🚀 Gouzi3618/LENS-8000
Gouzi3618/LENS-8000是一个专注于文本嵌入、特征提取和句子相似度计算的模型,它基于Transformer架构,在多个MTEB数据集上进行了测试,展现出了优秀的性能。
📚 详细文档
模型标签
- 文本嵌入(text-embedding)
- 特征提取(feature-extraction)
- 句子相似度(sentence-similarity)
- Transformer架构(transformers)
- MTEB基准测试(mteb)
模型评估结果
1. 分类任务(Classification)
数据集 |
准确率(accuracy) |
平均精度(ap) |
加权平均精度(ap_weighted) |
F1值(f1) |
加权F1值(f1_weighted) |
主得分(main_score) |
MTEB AmazonCounterfactualClassification (en) |
93.6865671641791 |
74.44778735403261 |
74.44778735403261 |
90.57338628851295 |
93.87207694461506 |
93.6865671641791 |
MTEB AmazonPolarityClassification (default) |
97.06832499999999 |
95.71019538629211 |
95.71019538629211 |
97.06781792337515 |
97.06781792337515 |
97.06832499999999 |
MTEB AmazonReviewsClassification (en) |
63.608 |
- |
- |
62.41274991021244 |
62.41274991021244 |
63.608 |
MTEB Banking77Classification (default) |
90.19155844155846 |
- |
- |
90.05716678902826 |
90.05716678902826 |
90.19155844155846 |
2. 检索任务(Retrieval)
数据集 |
主得分(main_score) |
MAP@1 |
MAP@10 |
MAP@100 |
MAP@1000 |
MAP@20 |
MAP@3 |
MAP@5 |
MRR@1 |
MRR@10 |
MRR@100 |
MRR@1000 |
MRR@20 |
MRR@3 |
MRR@5 |
MTEB ArguAna (default) |
76.019 |
55.903000000000006 |
69.887 |
70.157 |
70.159 |
70.101 |
67.378 |
69.138 |
56.899004267425326 |
70.23428503691676 |
70.50477756895107 |
70.5063694836776 |
70.44906432331086 |
67.73352299668105 |
69.46183025130412 |
MTEB CQADupstackAndroidRetrieval (default) |
57.9 |
37.412 |
51.01599999999999 |
52.61900000000001 |
52.708 |
51.928 |
46.685 |
49.105 |
46.20886981402003 |
56.82409110520696 |
57.489735501152694 |
57.51438904427485 |
57.25804902449886 |
54.10109680495945 |
55.76061039580349 |
3. 聚类任务(Clustering)
数据集 |
主得分(main_score) |
V-measure |
V-measure标准差 |
MTEB ArxivClusteringP2P (default) |
54.809064728970625 |
54.809064728970625 |
14.497861425102215 |
MTEB ArxivClusteringS2S (default) |
50.144159631474416 |
50.144159631474416 |
14.596959041091187 |
MTEB BiorxivClusteringP2P (default) |
52.480294793961924 |
52.480294793961924 |
0.5558452294416437 |
MTEB BiorxivClusteringS2S (default) |
48.51901581759115 |
48.51901581759115 |
1.1094735884191569 |
4. 重排序任务(Reranking)
数据集 |
主得分(main_score) |
MAP |
MRR |
nAUC_map_diff1 |
nAUC_map_max |
nAUC_map_std |
nAUC_mrr_diff1 |
nAUC_mrr_max |
nAUC_mrr_std |
MTEB AskUbuntuDupQuestions (default) |
65.74396432331054 |
65.74396432331054 |
77.89418722244206 |
22.172664271824022 |
22.232980127036896 |
22.763425465824056 |
30.670095862543384 |
34.51981156443003 |
28.863440464092747 |
5. 语义文本相似度任务(STS)
数据集 |
余弦皮尔逊相关系数(cosine_pearson) |
余弦斯皮尔曼相关系数(cosine_spearman) |
欧几里得皮尔逊相关系数(euclidean_pearson) |
欧几里得斯皮尔曼相关系数(euclidean_spearman) |
主得分(main_score) |
曼哈顿皮尔逊相关系数(manhattan_pearson) |
曼哈顿斯皮尔曼相关系数(manhattan_spearman) |
MTEB BIOSSES (default) |
86.59612727828603 |
85.83087137728063 |
84.64267159338176 |
85.83087137728063 |
85.83087137728063 |
85.70909201286793 |
85.96460936435044 |
📄 许可证
本模型使用Apache-2.0许可证。