Minicpm Embedding Light
M
Minicpm Embedding Light
由 openbmb 开发
该模型在多个中文和英文数据集上进行了测试,包括语义文本相似度和分类任务。
下载量 2,172
发布时间 : 1/17/2025
模型简介
该模型主要用于语义文本相似度和分类任务,支持中文和英文。
模型特点
多语言支持
支持中文和英文的语义文本相似度和分类任务。
多任务能力
能够处理语义文本相似度和分类等多种任务。
模型能力
语义文本相似度计算
文本分类
多语言处理
使用案例
电子商务
亚马逊评论分类
对亚马逊商品评论进行情感分类。
英文评论分类准确率47.95%,中文评论分类准确率43.92%。
语义分析
语义文本相似度计算
计算两段文本的语义相似度。
在AFQMC数据集上余弦皮尔逊值为31.60,在ATEC数据集上为40.90。
🚀 模型评估结果展示
本README文档展示了一个未命名模型在多个数据集上的评估结果,涵盖了语义文本相似度(STS)、分类、检索、聚类等多种任务。
📚 详细文档
模型评估信息
数据集名称 | 任务类型 | 主要指标 | 指标值 |
---|---|---|---|
MTEB AFQMC (default) | STS | main_score | 32.26566089398552 |
MTEB ATEC (default) (test) | STS | main_score | 40.342262626966686 |
MTEB ATEC (default) (validation) | STS | main_score | 41.15073385665892 |
MTEB AmazonCounterfactualClassification (en) | Classification | main_score | 74.1492537313433 |
MTEB AmazonPolarityClassification (default) | Classification | main_score | 92.57937499999998 |
MTEB AmazonReviewsClassification (en) | Classification | main_score | 47.95399999999999 |
MTEB AmazonReviewsClassification (zh) | Classification | main_score | 43.916000000000004 |
MTEB ArguAna (default) | Retrieval | main_score | 55.97299999999999 |
MTEB ArxivClusteringP2P (default) | Clustering | main_score | 48.31683216128774 |
MTEB ArxivClusteringS2S (default) | Clustering | main_score | 40.2951016935384 |
MTEB AskUbuntuDupQuestions (default) | Reranking | main_score | 60.45095169935259 |
MTEB BIOSSES (default) | STS | main_score | 88.4670760233051 |
MTEB BQ (default) (test) | STS | main_score | 49.77342815602789 |
MTEB BQ (default) (validation) | STS | main_score | 54.24325202324013 |
MTEB Banking77Classification (default) | Classification | main_score | 82.11038961038962 |
MTEB BiorxivClusteringP2P (default) | Clustering | main_score | 39.85718105201471 |
MTEB BiorxivClusteringS2S (default) | Clustering | main_score | 34.901371726743854 |
MTEB CLSClusteringP2P (default) | Clustering | main_score | 42.580911514601844 |
MTEB CLSClusteringS2S (default) | Clustering | main_score | 38.36369670561906 |
各数据集详细指标
MTEB AFQMC (default)
指标类型 | 指标值 |
---|---|
cosine_pearson | 31.60219205269865 |
cosine_spearman | 32.26566089398552 |
euclidean_pearson | 31.38659295608159 |
euclidean_spearman | 32.265680997074284 |
main_score | 32.26566089398552 |
manhattan_pearson | 31.012318343485934 |
manhattan_spearman | 31.881347232593882 |
pearson | 31.60219205269865 |
spearman | 32.26566089398552 |
MTEB ATEC (default) (test)
指标类型 | 指标值 |
---|---|
cosine_pearson | 40.89963324512739 |
cosine_spearman | 40.342262626966686 |
euclidean_pearson | 43.26579075620696 |
euclidean_spearman | 40.34226375259283 |
main_score | 40.342262626966686 |
manhattan_pearson | 43.09428997760782 |
manhattan_spearman | 40.132604575720485 |
pearson | 40.89963324512739 |
spearman | 40.342262626966686 |
MTEB ATEC (default) (validation)
指标类型 | 指标值 |
---|---|
cosine_pearson | 40.97674579633659 |
cosine_spearman | 41.15073385665892 |
euclidean_pearson | 43.12674145119401 |
euclidean_spearman | 41.15073497290901 |
main_score | 41.15073385665892 |
manhattan_pearson | 43.016332350517416 |
manhattan_spearman | 40.99128368771293 |
pearson | 40.97674579633659 |
spearman | 41.15073385665892 |
MTEB AmazonCounterfactualClassification (en)
指标类型 | 指标值 |
---|---|
accuracy | 74.1492537313433 |
ap | 36.58820102143676 |
ap_weighted | 36.58820102143676 |
f1 | 67.93641050300623 |
f1_weighted | 76.47946936836382 |
main_score | 74.1492537313433 |
MTEB AmazonPolarityClassification (default)
指标类型 | 指标值 |
---|---|
accuracy | 92.57937499999998 |
ap | 89.09881932276382 |
ap_weighted | 89.09881932276382 |
f1 | 92.57389464257594 |
f1_weighted | 92.57389464257594 |
main_score | 92.57937499999998 |
MTEB AmazonReviewsClassification (en)
指标类型 | 指标值 |
---|---|
accuracy | 47.95399999999999 |
f1 | 45.23480325168402 |
f1_weighted | 45.23480325168403 |
main_score | 47.95399999999999 |
MTEB AmazonReviewsClassification (zh)
指标类型 | 指标值 |
---|---|
accuracy | 43.916000000000004 |
f1 | 40.79038102586015 |
f1_weighted | 40.79038102586015 |
main_score | 43.916000000000004 |
MTEB ArguAna (default)
指标类型 | 指标值 |
---|---|
main_score | 55.97299999999999 |
map_at_1 | 31.009999999999998 |
map_at_10 | 46.951 |
map_at_100 | 47.788000000000004 |
map_at_1000 | 47.794 |
map_at_20 | 47.656 |
map_at_3 | 41.69 |
map_at_5 | 44.795 |
mrr_at_1 | 31.57894736842105 |
mrr_at_10 | 47.150336426652245 |
mrr_at_100 | 48.00003421265431 |
mrr_at_1000 | 48.006517491673485 |
mrr_at_20 | 47.86823495425013 |
mrr_at_3 | 41.90374585111427 |
mrr_at_5 | 45.00474158368897 |
nauc_map_at_1000_diff1 | 14.400156277962079 |
nauc_map_at_1000_max | -6.074701279893042 |
nauc_map_at_1000_std | -12.047730490841793 |
nauc_map_at_100_diff1 | 14.400167976253817 |
nauc_map_at_100_max | -6.0697710559623825 |
nauc_map_at_100_std | -12.03623231778573 |
nauc_map_at_10_diff1 | 13.774482728152659 |
nauc_map_at_10_max | -5.112671934691593 |
nauc_map_at_10_std | -11.45598979914733 |
nauc_map_at_1_diff1 | 18.2188090059407 |
nauc_map_at_1_max | -6.90680836409332 |
nauc_map_at_1_std | -11.42044016086847 |
nauc_map_at_20_diff1 | 13.19308743032763 |
nauc_map_at_20_max | -5.925869069550241 |
nauc_map_at_20_std | -12.002174058926709 |
nauc_map_at_3_diff1 | 14.098445595476438 |
nauc_map_at_3_max | -5.438990657735945 |
nauc_map_at_3_std | -13.026198448199588 |
nauc_map_at_5_diff1 | 12.887695825204021 |
nauc_map_at_5_max | -5.527892954283733 |
nauc_map_at_5_std | -12.79674424315614 |
nauc_mrr_at_1000_diff1 | 12.907349893032888 |
nauc_mrr_at_1000_max | -6.476631933744489 |
nauc_mrr_at_1000_std | -12.135655638319898 |
nauc_mrr_at_100_diff1 | 12.90767904668398 |
nauc_mrr_at_100_max | -6.471625560815013 |
nauc_mrr_at_100_std | -12.124160525865376 |
nauc_mrr_at_10_diff1 | 12.898084989549307 |
nauc_mrr_at_10_max | -6.371999485392878 |
nauc_mrr_at_10_std | -12.060712822104344 |
nauc_mrr_at_1_diff1 | 16.534028417854632 |
nauc_mrr_at_1_max | -6.531221880816804 |
nauc_mrr_at_1_std | -11.427032725801363 |
nauc_mrr_at_20_diff1 | 12.772149932536516 |
nauc_mrr_at_20_max | -6.536237532046593 |
nauc_mrr_at_20_std | -12.18322445801735 |
nauc_mrr_at_3_diff1 | 13.294722540439723 |
nauc_mrr_at_3_max | -6.270285589254632 |
nauc_mrr_at_3_std | -12.590739373950477 |
nauc_mrr_at_5_diff1 | 12.701572066028916 |
nauc_mrr_at_5_max | -6.35025779804965 |
nauc_mrr_at_5_std | -12.567997847961006 |
nauc_ndcg_at_1000_diff1 | 14.04477346308097 |
nauc_ndcg_at_1000_max | -5.805803656284627 |
nauc_ndcg_at_1000_std | -11.903389341799974 |
nauc_ndcg_at_100_diff1 | 14.046024694124535 |
nauc_ndcg_at_100_max | -5.638595406841976 |
nauc_ndcg_at_100_std | -11.563718937605266 |
nauc_ndcg_at_10_diff1 | 13.774482728152659 |
nauc_ndcg_at_10_max | -5.112671934691593 |
nauc_ndcg_at_10_std | -11.45598979914733 |
nauc_ndcg_at_1_diff1 | 18.2188090059407 |
nauc_ndcg_at_1_max | -6.90680836409332 |
nauc_ndcg_at_1_std | -11.42044016086847 |
nauc_ndcg_at_20_diff1 | 13.19308743032763 |
nauc_ndcg_at_20_max | -5.925869069550241 |
nauc_ndcg_at_20_std | -12.002174058926709 |
nauc_ndcg_at_3_diff1 | 14.098445595476438 |
nauc_ndcg_at_3_max | -5.438990657735945 |
nauc_ndcg_at_3_std | -13.026198448199588 |
nauc_ndcg_at_5_diff1 | 12.887695825204021 |
nauc_ndcg_at_5_max | -5.527892954283733 |
nauc_ndcg_at_5_std | -12.79674424315614 |
nauc_precision_at_1000_diff1 | 15.720975272424962 |
nauc_precision_at_1000_max | -9.434922353859656 |
nauc_precision_at_1000_std | -12.201774463835351 |
nauc_precision_at_100_diff1 | 14.822568320368415 |
nauc_precision_at_100_max | 16.970591395955335 |
nauc_precision_at_100_std | 34.44303415297543 |
nauc_precision_at_10_diff1 | 10.924572747165758 |
nauc_precision_at_10_max | 0.7245336905113386 |
nauc_precision_at_10_std | -7.246984906362029 |
nauc_precision_at_1_diff1 | 18.2188090059407 |
nauc_precision_at_1_max | -6.90680836409332 |
nauc_precision_at_1_std | -11.42044016086847 |
nauc_precision_at_20_diff1 | -3.338584460694707 |
nauc_precision_at_20_max | -4.566280243136391 |
nauc_precision_at_20_std | -10.006136097038183 |
nauc_precision_at_3_diff1 | 12.491306916226456 |
nauc_precision_at_3_max | -3.939014391748743 |
nauc_precision_at_3_std | -14.18952698929006 |
nauc_precision_at_5_diff1 | 8.856000600248196 |
nauc_precision_at_5_max | -3.5855091847389 |
nauc_precision_at_5_std | -13.869699312071923 |
nauc_recall_at_1000_diff1 | 15.720975272417975 |
nauc_recall_at_1000_max | -9.434922353860903 |
nauc_recall_at_1000_std | -12.201774463832038 |
nauc_recall_at_100_diff1 | 14.822568320369559 |
nauc_recall_at_100_max | 16.970591395954745 |
nauc_recall_at_100_std | 34.443034152975024 |
nauc_recall_at_10_diff1 | 10.924572747165762 |
nauc_recall_at_10_max | 0.724533690511315 |
nauc_recall_at_10_std | -7.246984906362018 |
nauc_recall_at_1_diff1 | 18.2188090059407 |
nauc_recall_at_1_max | -6.90680836409332 |
nauc_recall_at_1_std | -11.42044016086847 |
nauc_recall_at_20_diff1 | -3.3385844606947677 |
nauc_recall_at_20_max | -4.566280243136629 |
nauc_recall_at_20_std | -10.006136097038366 |
nauc_recall_at_3_diff1 | 12.491306916226472 |
nauc_recall_at_3_max | -3.939014391748735 |
nauc_recall_at_3_std | -14.189526989290059 |
nauc_recall_at_5_diff1 | 8.856000600248263 |
nauc_recall_at_5_max | -3.5855091847388603 |
nauc_recall_at_5_std | -13.869699312071909 |
ndcg_at_1 | 31.009999999999998 |
ndcg_at_10 | 55.97299999999999 |
ndcg_at_100 | 59.272000000000006 |
ndcg_at_1000 | 59.407 |
ndcg_at_20 | 58.449 |
ndcg_at_3 | 45.227000000000004 |
ndcg_at_5 | 50.792 |
precision_at_1 | 31.009999999999998 |
precision_at_10 | 8.485 |
precision_at_100 | 0.9860000000000001 |
precision_at_1000 | 0.1 |
precision_at_20 | 4.723 |
precision_at_3 | 18.492 |
precision_at_5 | 13.783999999999999 |
recall_at_1 | 31.009999999999998 |
recall_at_10 | 84.851 |
recall_at_100 | 98.649 |
recall_at_1000 | 99.644 |
recall_at_20 | 94.452 |
recall_at_3 | 55.477 |
recall_at_5 | 68.919 |
MTEB ArxivClusteringP2P (default)
指标类型 | 指标值 |
---|---|
main_score | 48.31683216128774 |
v_measure | 48.31683216128774 |
v_measure_std | 13.795207109799703 |
MTEB ArxivClusteringS2S (default)
指标类型 | 指标值 |
---|---|
main_score | 40.2951016935384 |
v_measure | 40.2951016935384 |
v_measure_std | 14.193710444297869 |
MTEB AskUbuntuDupQuestions (default)
指标类型 | 指标值 |
---|---|
main_score | 60.45095169935259 |
map | 60.45095169935259 |
mrr | 73.43567251461988 |
nAUC_map_diff1 | 15.357222913791704 |
nAUC_map_max | 24.301239659848346 |
nAUC_map_std | 18.26732583044278 |
nAUC_mrr_diff1 | 24.108010981589057 |
nAUC_mrr_max | 34.90261214387396 |
nAUC_mrr_std | 20.350034497982126 |
MTEB BIOSSES (default)
指标类型 | 指标值 |
---|---|
cosine_pearson | 90.16604991710759 |
cosine_spearman | 88.4670760233051 |
euclidean_pearson | 89.02378164860428 |
euclidean_spearman | 88.4670760233051 |
main_score | 88.4670760233051 |
manhattan_pearson | 88.8866912507422 |
manhattan_spearman | 88.2755053931781 |
pearson | 90.16604991710759 |
spearman | 88.4670760233051 |
MTEB BQ (default) (test)
指标类型 | 指标值 |
---|---|
cosine_pearson | 49.45233904713201 |
cosine_spearman | 49.77342815602789 |
euclidean_pearson | 49.13579036236359 |
euclidean_spearman | 49.77342122767529 |
main_score | 49.77342815602789 |
manhattan_pearson | 49.01322677955527 |
manhattan_spearman | 49.702538779772226 |
pearson | 49.45233904713201 |
spearman | 49.77342815602789 |
MTEB BQ (default) (validation)
指标类型 | 指标值 |
---|---|
cosine_pearson | 53.43473222697715 |
cosine_spearman | 54.24325202324013 |
euclidean_pearson | 53.4053341221681 |
euclidean_spearman | 54.2432485591385 |
main_score | 54.24325202324013 |
manhattan_pearson | 53.31602762068146 |
manhattan_spearman | 54.180811590825925 |
pearson | 53.43473222697715 |
spearman | 54.24325202324013 |
MTEB Banking77Classification (default)
指标类型 | 指标值 |
---|---|
accuracy | 82.11038961038962 |
f1 | 81.50275371635729 |
f1_weighted | 81.50275371635732 |
main_score | 82.11038961038962 |
MTEB BiorxivClusteringP2P (default)
指标类型 | 指标值 |
---|---|
main_score | 39.85718105201471 |
v_measure | 39.85718105201471 |
v_measure_std | 0.9098592525717781 |
MTEB BiorxivClusteringS2S (default)
指标类型 | 指标值 |
---|---|
main_score | 34.901371726743854 |
v_measure | 34.901371726743854 |
v_measure_std | 0.49131958662099773 |
MTEB CLSClusteringP2P (default)
指标类型 | 指标值 |
---|---|
main_score | 42.580911514601844 |
v_measure | 42.580911514601844 |
v_measure_std | 1.3262494874619402 |
MTEB CLSClusteringS2S (default)
指标类型 | 指标值 |
---|---|
main_score | 38.36369670561906 |
v_measure | 38.36369670561906 |
v_measure_std | 1.3030031287521193 |
Phi 2 GGUF
其他
Phi-2是微软开发的一个小型但强大的语言模型,具有27亿参数,专注于高效推理和高质量文本生成。
大型语言模型 支持多种语言
P
TheBloke
41.5M
205
Roberta Large
MIT
基于掩码语言建模目标预训练的大型英语语言模型,采用改进的BERT训练方法
大型语言模型 英语
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT是BERT基础模型的蒸馏版本,在保持相近性能的同时更轻量高效,适用于序列分类、标记分类等自然语言处理任务。
大型语言模型 英语
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct 是一个多语言大语言模型,针对多语言对话用例进行了优化,在常见的行业基准测试中表现优异。
大型语言模型 英语
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa是基于100种语言的2.5TB过滤CommonCrawl数据预训练的多语言模型,采用掩码语言建模目标进行训练。
大型语言模型 支持多种语言
X
FacebookAI
9.6M
664
Roberta Base
MIT
基于Transformer架构的英语预训练模型,通过掩码语言建模目标在海量文本上训练,支持文本特征提取和下游任务微调
大型语言模型 英语
R
FacebookAI
9.3M
488
Opt 125m
其他
OPT是由Meta AI发布的开放预训练Transformer语言模型套件,参数量从1.25亿到1750亿,旨在对标GPT-3系列性能,同时促进大规模语言模型的开放研究。
大型语言模型 英语
O
facebook
6.3M
198
1
基于transformers库的预训练模型,适用于多种NLP任务
大型语言模型
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1是Meta推出的多语言大语言模型系列,包含8B、70B和405B参数规模,支持8种语言和代码生成,优化了多语言对话场景。
大型语言模型
Transformers 支持多种语言

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
T5基础版是由Google开发的文本到文本转换Transformer模型,参数规模2.2亿,支持多语言NLP任务。
大型语言模型 支持多种语言
T
google-t5
5.4M
702
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98