Minicpm Embedding Light
M
Minicpm Embedding Light
由openbmb開發
該模型在多箇中文和英文數據集上進行了測試,包括語義文本相似度和分類任務。
下載量 2,172
發布時間 : 1/17/2025
模型概述
該模型主要用於語義文本相似度和分類任務,支持中文和英文。
模型特點
多語言支持
支持中文和英文的語義文本相似度和分類任務。
多任務能力
能夠處理語義文本相似度和分類等多種任務。
模型能力
語義文本相似度計算
文本分類
多語言處理
使用案例
電子商務
亞馬遜評論分類
對亞馬遜商品評論進行情感分類。
英文評論分類準確率47.95%,中文評論分類準確率43.92%。
語義分析
語義文本相似度計算
計算兩段文本的語義相似度。
在AFQMC數據集上餘弦皮爾遜值為31.60,在ATEC數據集上為40.90。
🚀 模型評估結果展示
本README檔案展示了一個未命名模型在多個數据集上的評估結果,涵蓋了語義文字相似度(STS)、分類、檢索、聚類等多種任務。
📚 詳細檔案
###模型評估資訊
數据集名稱 | 任務類型 | 主要指標 | 名額值 |
---|---|---|---|
MTEB AFQMC (default) | STS | main_score | 32.26566089398552 |
MTEB ATEC (default) (test) | STS | main_score | 40.342262626966686 |
MTEB ATEC (default) (validation) | STS | main_score | 41.15073385665892 |
MTEB AmazonCounterfactualClassification (en) | Classification | main_score | 74.1492537313433 |
MTEB AmazonPolarityClassification (default) | Classification | main_score | 92.57937499999998 |
MTEB AmazonReviewsClassification (en) | Classification | main_score | 47.95399999999999 |
MTEB AmazonReviewsClassification (zh) | Classification | main_score | 43.916000000000004 |
MTEB ArguAna (default) | Retrieval | main_score | 55.97299999999999 |
MTEB ArxivClusteringP2P (default) | Clustering | main_score | 48.31683216128774 |
MTEB ArxivClusteringS2S (default) | Clustering | main_score | 40.2951016935384 |
MTEB AskUbuntuDupQuestions (default) | Reranking | main_score | 60.45095169935259 |
MTEB BIOSSES (default) | STS | main_score | 88.4670760233051 |
MTEB BQ (default) (test) | STS | main_score | 49.77342815602789 |
MTEB BQ (default) (validation) | STS | main_score | 54.24325202324013 |
MTEB Banking77Classification (default) | Classification | main_score | 82.11038961038962 |
MTEB BiorxivClusteringP2P (default) | Clustering | main_score | 39.85718105201471 |
MTEB BiorxivClusteringS2S (default) | Clustering | main_score | 34.901371726743854 |
MTEB CLSClusteringP2P (default) | Clustering | main_score | 42.580911514601844 |
MTEB CLSClusteringS2S (default) | Clustering | main_score | 38.36369670561906 |
###各數据集詳細名額 |
MTEB AFQMC (default)
名額類型 | 名額值 |
---|---|
cosine_pearson | 31.60219205269865 |
cosine_spearman | 32.26566089398552 |
euclidean_pearson | 31.38659295608159 |
euclidean_spearman | 32.265680997074284 |
main_score | 32.26566089398552 |
manhattan_pearson | 31.012318343485934 |
manhattan_spearman | 31.881347232593882 |
pearson | 31.60219205269865 |
spearman | 32.26566089398552 |
MTEB ATEC (default) (test)
名額類型 | 名額值 |
---|---|
cosine_pearson | 40.89963324512739 |
cosine_spearman | 40.342262626966686 |
euclidean_pearson | 43.26579075620696 |
euclidean_spearman | 40.34226375259283 |
main_score | 40.342262626966686 |
manhattan_pearson | 43.09428997760782 |
manhattan_spearman | 40.132604575720485 |
pearson | 40.89963324512739 |
spearman | 40.342262626966686 |
MTEB ATEC (default) (validation)
名額類型 | 名額值 |
---|---|
cosine_pearson | 40.97674579633659 |
cosine_spearman | 41.15073385665892 |
euclidean_pearson | 43.12674145119401 |
euclidean_spearman | 41.15073497290901 |
main_score | 41.15073385665892 |
manhattan_pearson | 43.016332350517416 |
manhattan_spearman | 40.99128368771293 |
pearson | 40.97674579633659 |
spearman | 41.15073385665892 |
MTEB AmazonCounterfactualClassification (en)
名額類型 | 名額值 |
---|---|
accuracy | 74.1492537313433 |
ap | 36.58820102143676 |
ap_weighted | 36.58820102143676 |
f1 | 67.93641050300623 |
f1_weighted | 76.47946936836382 |
main_score | 74.1492537313433 |
MTEB AmazonPolarityClassification (default)
名額類型 | 名額值 |
---|---|
accuracy | 92.57937499999998 |
ap | 89.09881932276382 |
ap_weighted | 89.09881932276382 |
f1 | 92.57389464257594 |
f1_weighted | 92.57389464257594 |
main_score | 92.57937499999998 |
MTEB AmazonReviewsClassification (en)
名額類型 | 名額值 |
---|---|
accuracy | 47.95399999999999 |
f1 | 45.23480325168402 |
f1_weighted | 45.23480325168403 |
main_score | 47.95399999999999 |
MTEB AmazonReviewsClassification (zh)
名額類型 | 名額值 |
---|---|
accuracy | 43.916000000000004 |
f1 | 40.79038102586015 |
f1_weighted | 40.79038102586015 |
main_score | 43.916000000000004 |
MTEB ArguAna (default)
名額類型 | 名額值 |
---|---|
main_score | 55.97299999999999 |
map_at_1 | 31.009999999999998 |
map_at_10 | 46.951 |
map_at_100 | 47.788000000000004 |
map_at_1000 | 47.794 |
map_at_20 | 47.656 |
map_at_3 | 41.69 |
map_at_5 | 44.795 |
mrr_at_1 | 31.57894736842105 |
mrr_at_10 | 47.150336426652245 |
mrr_at_100 | 48.00003421265431 |
mrr_at_1000 | 48.006517491673485 |
mrr_at_20 | 47.86823495425013 |
mrr_at_3 | 41.90374585111427 |
mrr_at_5 | 45.00474158368897 |
nauc_map_at_1000_diff1 | 14.400156277962079 |
nauc_map_at_1000_max | -6.074701279893042 |
nauc_map_at_1000_std | -12.047730490841793 |
nauc_map_at_100_diff1 | 14.400167976253817 |
nauc_map_at_100_max | -6.0697710559623825 |
nauc_map_at_100_std | -12.03623231778573 |
nauc_map_at_10_diff1 | 13.774482728152659 |
nauc_map_at_10_max | -5.112671934691593 |
nauc_map_at_10_std | -11.45598979914733 |
nauc_map_at_1_diff1 | 18.2188090059407 |
nauc_map_at_1_max | -6.90680836409332 |
nauc_map_at_1_std | -11.42044016086847 |
nauc_map_at_20_diff1 | 13.19308743032763 |
nauc_map_at_20_max | -5.925869069550241 |
nauc_map_at_20_std | -12.002174058926709 |
nauc_map_at_3_diff1 | 14.098445595476438 |
nauc_map_at_3_max | -5.438990657735945 |
nauc_map_at_3_std | -13.026198448199588 |
nauc_map_at_5_diff1 | 12.887695825204021 |
nauc_map_at_5_max | -5.527892954283733 |
nauc_map_at_5_std | -12.79674424315614 |
nauc_mrr_at_1000_diff1 | 12.907349893032888 |
nauc_mrr_at_1000_max | -6.476631933744489 |
nauc_mrr_at_1000_std | -12.135655638319898 |
nauc_mrr_at_100_diff1 | 12.90767904668398 |
nauc_mrr_at_100_max | -6.471625560815013 |
nauc_mrr_at_100_std | -12.124160525865376 |
nauc_mrr_at_10_diff1 | 12.898084989549307 |
nauc_mrr_at_10_max | -6.371999485392878 |
nauc_mrr_at_10_std | -12.060712822104344 |
nauc_mrr_at_1_diff1 | 16.534028417854632 |
nauc_mrr_at_1_max | -6.531221880816804 |
nauc_mrr_at_1_std | -11.427032725801363 |
nauc_mrr_at_20_diff1 | 12.772149932536516 |
nauc_mrr_at_20_max | -6.536237532046593 |
nauc_mrr_at_20_std | -12.18322445801735 |
nauc_mrr_at_3_diff1 | 13.294722540439723 |
nauc_mrr_at_3_max | -6.270285589254632 |
nauc_mrr_at_3_std | -12.590739373950477 |
nauc_mrr_at_5_diff1 | 12.701572066028916 |
nauc_mrr_at_5_max | -6.35025779804965 |
nauc_mrr_at_5_std | -12.567997847961006 |
nauc_ndcg_at_1000_diff1 | 14.04477346308097 |
nauc_ndcg_at_1000_max | -5.805803656284627 |
nauc_ndcg_at_1000_std | -11.903389341799974 |
nauc_ndcg_at_100_diff1 | 14.046024694124535 |
nauc_ndcg_at_100_max | -5.638595406841976 |
nauc_ndcg_at_100_std | -11.563718937605266 |
nauc_ndcg_at_10_diff1 | 13.774482728152659 |
nauc_ndcg_at_10_max | -5.112671934691593 |
nauc_ndcg_at_10_std | -11.45598979914733 |
nauc_ndcg_at_1_diff1 | 18.2188090059407 |
nauc_ndcg_at_1_max | -6.90680836409332 |
nauc_ndcg_at_1_std | -11.42044016086847 |
nauc_ndcg_at_20_diff1 | 13.19308743032763 |
nauc_ndcg_at_20_max | -5.925869069550241 |
nauc_ndcg_at_20_std | -12.002174058926709 |
nauc_ndcg_at_3_diff1 | 14.098445595476438 |
nauc_ndcg_at_3_max | -5.438990657735945 |
nauc_ndcg_at_3_std | -13.026198448199588 |
nauc_ndcg_at_5_diff1 | 12.887695825204021 |
nauc_ndcg_at_5_max | -5.527892954283733 |
nauc_ndcg_at_5_std | -12.79674424315614 |
nauc_precision_at_1000_diff1 | 15.720975272424962 |
nauc_precision_at_1000_max | -9.434922353859656 |
nauc_precision_at_1000_std | -12.201774463835351 |
nauc_precision_at_100_diff1 | 14.822568320368415 |
nauc_precision_at_100_max | 16.970591395955335 |
nauc_precision_at_100_std | 34.44303415297543 |
nauc_precision_at_10_diff1 | 10.924572747165758 |
nauc_precision_at_10_max | 0.7245336905113386 |
nauc_precision_at_10_std | -7.246984906362029 |
nauc_precision_at_1_diff1 | 18.2188090059407 |
nauc_precision_at_1_max | -6.90680836409332 |
nauc_precision_at_1_std | -11.42044016086847 |
nauc_precision_at_20_diff1 | -3.338584460694707 |
nauc_precision_at_20_max | -4.566280243136391 |
nauc_precision_at_20_std | -10.006136097038183 |
nauc_precision_at_3_diff1 | 12.491306916226456 |
nauc_precision_at_3_max | -3.939014391748743 |
nauc_precision_at_3_std | -14.18952698929006 |
nauc_precision_at_5_diff1 | 8.856000600248196 |
nauc_precision_at_5_max | -3.5855091847389 |
nauc_precision_at_5_std | -13.869699312071923 |
nauc_recall_at_1000_diff1 | 15.720975272417975 |
nauc_recall_at_1000_max | -9.434922353860903 |
nauc_recall_at_1000_std | -12.201774463832038 |
nauc_recall_at_100_diff1 | 14.822568320369559 |
nauc_recall_at_100_max | 16.970591395954745 |
nauc_recall_at_100_std | 34.443034152975024 |
nauc_recall_at_10_diff1 | 10.924572747165762 |
nauc_recall_at_10_max | 0.724533690511315 |
nauc_recall_at_10_std | -7.246984906362018 |
nauc_recall_at_1_diff1 | 18.2188090059407 |
nauc_recall_at_1_max | -6.90680836409332 |
nauc_recall_at_1_std | -11.42044016086847 |
nauc_recall_at_20_diff1 | -3.3385844606947677 |
nauc_recall_at_20_max | -4.566280243136629 |
nauc_recall_at_20_std | -10.006136097038366 |
nauc_recall_at_3_diff1 | 12.491306916226472 |
nauc_recall_at_3_max | -3.939014391748735 |
nauc_recall_at_3_std | -14.189526989290059 |
nauc_recall_at_5_diff1 | 8.856000600248263 |
nauc_recall_at_5_max | -3.5855091847388603 |
nauc_recall_at_5_std | -13.869699312071909 |
ndcg_at_1 | 31.009999999999998 |
ndcg_at_10 | 55.97299999999999 |
ndcg_at_100 | 59.272000000000006 |
ndcg_at_1000 | 59.407 |
ndcg_at_20 | 58.449 |
ndcg_at_3 | 45.227000000000004 |
ndcg_at_5 | 50.792 |
precision_at_1 | 31.009999999999998 |
precision_at_10 | 8.485 |
precision_at_100 | 0.9860000000000001 |
precision_at_1000 | 0.1 |
precision_at_20 | 4.723 |
precision_at_3 | 18.492 |
precision_at_5 | 13.783999999999999 |
recall_at_1 | 31.009999999999998 |
recall_at_10 | 84.851 |
recall_at_100 | 98.649 |
recall_at_1000 | 99.644 |
recall_at_20 | 94.452 |
recall_at_3 | 55.477 |
recall_at_5 | 68.919 |
MTEB ArxivClusteringP2P (default)
指标类型 | 指标值 |
---|---|
main_score | 48.31683216128774 |
v_measure | 48.31683216128774 |
v_measure_std | 13.795207109799703 |
MTEB ArxivClusteringS2S (default)
指标类型 | 指标值 |
---|---|
main_score | 40.2951016935384 |
v_measure | 40.2951016935384 |
v_measure_std | 14.193710444297869 |
MTEB AskUbuntuDupQuestions (default)
指标类型 | 指标值 |
---|---|
main_score | 60.45095169935259 |
map | 60.45095169935259 |
mrr | 73.43567251461988 |
nAUC_map_diff1 | 15.357222913791704 |
nAUC_map_max | 24.301239659848346 |
nAUC_map_std | 18.26732583044278 |
nAUC_mrr_diff1 | 24.108010981589057 |
nAUC_mrr_max | 34.90261214387396 |
nAUC_mrr_std | 20.350034497982126 |
MTEB BIOSSES (default)
指标类型 | 指标值 |
---|---|
cosine_pearson | 90.16604991710759 |
cosine_spearman | 88.4670760233051 |
euclidean_pearson | 89.02378164860428 |
euclidean_spearman | 88.4670760233051 |
main_score | 88.4670760233051 |
manhattan_pearson | 88.8866912507422 |
manhattan_spearman | 88.2755053931781 |
pearson | 90.16604991710759 |
spearman | 88.4670760233051 |
MTEB BQ (default) (test)
指标类型 | 指标值 |
---|---|
cosine_pearson | 49.45233904713201 |
cosine_spearman | 49.77342815602789 |
euclidean_pearson | 49.13579036236359 |
euclidean_spearman | 49.77342122767529 |
main_score | 49.77342815602789 |
manhattan_pearson | 49.01322677955527 |
manhattan_spearman | 49.702538779772226 |
pearson | 49.45233904713201 |
spearman | 49.77342815602789 |
MTEB BQ (default) (validation)
指标类型 | 指标值 |
---|---|
cosine_pearson | 53.43473222697715 |
cosine_spearman | 54.24325202324013 |
euclidean_pearson | 53.4053341221681 |
euclidean_spearman | 54.2432485591385 |
main_score | 54.24325202324013 |
manhattan_pearson | 53.31602762068146 |
manhattan_spearman | 54.180811590825925 |
pearson | 53.43473222697715 |
spearman | 54.24325202324013 |
MTEB Banking77Classification (default)
指标类型 | 指标值 |
---|---|
accuracy | 82.11038961038962 |
f1 | 81.50275371635729 |
f1_weighted | 81.50275371635732 |
main_score | 82.11038961038962 |
MTEB BiorxivClusteringP2P (default)
指标类型 | 指标值 |
---|---|
main_score | 39.85718105201471 |
v_measure | 39.85718105201471 |
v_measure_std | 0.9098592525717781 |
MTEB BiorxivClusteringS2S (default)
指标类型 | 指标值 |
---|---|
main_score | 34.901371726743854 |
v_measure | 34.901371726743854 |
v_measure_std | 0.49131958662099773 |
MTEB CLSClusteringP2P (default)
指标类型 | 指标值 |
---|---|
main_score | 42.580911514601844 |
v_measure | 42.580911514601844 |
v_measure_std | 1.3262494874619402 |
MTEB CLSClusteringS2S (default)
指标类型 | 指标值 |
---|---|
main_score | 38.36369670561906 |
v_measure | 38.36369670561906 |
v_measure_std | 1.3030031287521193 |
Phi 2 GGUF
其他
Phi-2是微軟開發的一個小型但強大的語言模型,具有27億參數,專注於高效推理和高質量文本生成。
大型語言模型 支持多種語言
P
TheBloke
41.5M
205
Roberta Large
MIT
基於掩碼語言建模目標預訓練的大型英語語言模型,採用改進的BERT訓練方法
大型語言模型 英語
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT是BERT基礎模型的蒸餾版本,在保持相近性能的同時更輕量高效,適用於序列分類、標記分類等自然語言處理任務。
大型語言模型 英語
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct 是一個多語言大語言模型,針對多語言對話用例進行了優化,在常見的行業基準測試中表現優異。
大型語言模型 英語
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa是基於100種語言的2.5TB過濾CommonCrawl數據預訓練的多語言模型,採用掩碼語言建模目標進行訓練。
大型語言模型 支持多種語言
X
FacebookAI
9.6M
664
Roberta Base
MIT
基於Transformer架構的英語預訓練模型,通過掩碼語言建模目標在海量文本上訓練,支持文本特徵提取和下游任務微調
大型語言模型 英語
R
FacebookAI
9.3M
488
Opt 125m
其他
OPT是由Meta AI發佈的開放預訓練Transformer語言模型套件,參數量從1.25億到1750億,旨在對標GPT-3系列性能,同時促進大規模語言模型的開放研究。
大型語言模型 英語
O
facebook
6.3M
198
1
基於transformers庫的預訓練模型,適用於多種NLP任務
大型語言模型
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1是Meta推出的多語言大語言模型系列,包含8B、70B和405B參數規模,支持8種語言和代碼生成,優化了多語言對話場景。
大型語言模型
Transformers 支持多種語言

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
T5基礎版是由Google開發的文本到文本轉換Transformer模型,參數規模2.2億,支持多語言NLP任務。
大型語言模型 支持多種語言
T
google-t5
5.4M
702
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98