Snowflake Arctic Embed M V1.5
S
Snowflake Arctic Embed M V1.5
由 Snowflake 开发
Snowflake Arctic Embed M v1.5 是一个高效的句子嵌入模型,专注于句子相似度计算和特征提取任务。
下载量 219.46k
发布时间 : 7/3/2024
模型简介
该模型设计用于生成高质量的句子嵌入,支持多种检索和相似度计算任务,在MTEB基准测试中表现良好。
模型特点
高效的句子嵌入
能够生成高质量的句子嵌入表示,适用于各种相似度计算任务
MTEB基准测试验证
在多个MTEB基准测试数据集上表现出色
支持Transformers.js
可以在浏览器环境中使用Transformers.js运行
模型能力
句子相似度计算
特征提取
文本检索
语义搜索
使用案例
信息检索
问答系统
用于检索与用户问题最相关的答案
在CQADupstack数据集上表现出良好的检索性能
文档相似性搜索
查找语义相似的文档或段落
在ArguAna数据集上达到59.53的主要分数
内容推荐
相关内容推荐
基于语义相似度推荐相关内容
🚀 snowflake-arctic-embed-m-v1.5
snowflake-arctic-embed-m-v1.5
是一个用于句子相似度计算和检索任务的模型,在多个 MTEB 数据集上进行了评估,展现出了一定的性能表现。
📚 详细文档
模型标签与信息
- pipeline_tag:句子相似度
- tags:sentence-transformers、feature-extraction、sentence-similarity、mteb、arctic、snowflake-arctic-embed、transformers.js
- license:apache-2.0
- task:Retrieval
模型评估结果
MTEB ArguAna 数据集
指标类型 | 值 |
---|---|
main_score | 59.53000000000001 |
map_at_1 | 34.282000000000004 |
map_at_10 | 50.613 |
map_at_100 | 51.269 |
map_at_1000 | 51.271 |
map_at_20 | 51.158 |
map_at_3 | 45.626 |
map_at_5 | 48.638 |
mrr_at_1 | 34.92176386913229 |
mrr_at_10 | 50.856081645555406 |
mrr_at_100 | 51.510739437069034 |
mrr_at_1000 | 51.51299498830165 |
mrr_at_20 | 51.39987941081724 |
mrr_at_3 | 45.993361782835514 |
mrr_at_5 | 48.88098624940742 |
nauc_map_at_1000_diff1 | 10.628675774160785 |
nauc_map_at_1000_max | -10.11742589992339 |
nauc_map_at_1000_std | -18.29277379812427 |
nauc_map_at_100_diff1 | 10.63250240035489 |
nauc_map_at_100_max | -10.112078786734363 |
nauc_map_at_100_std | -18.288524872706834 |
nauc_map_at_10_diff1 | 10.476494913081712 |
nauc_map_at_10_max | -9.890937746734037 |
nauc_map_at_10_std | -18.279750514750443 |
nauc_map_at_1_diff1 | 14.549204048461151 |
nauc_map_at_1_max | -12.230560087701225 |
nauc_map_at_1_std | -19.469903650130362 |
nauc_map_at_20_diff1 | 10.586564571825674 |
nauc_map_at_20_max | -10.00292720526217 |
nauc_map_at_20_std | -18.258077347878064 |
nauc_map_at_3_diff1 | 10.378663968090372 |
nauc_map_at_3_max | -10.458896171786185 |
nauc_map_at_3_std | -18.38852760333766 |
nauc_map_at_5_diff1 | 10.235960275925581 |
nauc_map_at_5_max | -10.239496080409058 |
nauc_map_at_5_std | -18.817023479445886 |
nauc_mrr_at_1000_diff1 | 8.718212649575722 |
nauc_mrr_at_1000_max | -10.81022794038691 |
nauc_mrr_at_1000_std | -17.87669499555167 |
nauc_mrr_at_100_diff1 | 8.722174171165133 |
nauc_mrr_at_100_max | -10.804840985713525 |
nauc_mrr_at_100_std | -17.872487099359986 |
nauc_mrr_at_10_diff1 | 8.609421635870238 |
nauc_mrr_at_10_max | -10.568644717548432 |
nauc_mrr_at_10_std | -17.872968762635814 |
nauc_mrr_at_1_diff1 | 12.69590006263834 |
nauc_mrr_at_1_max | -12.082056561238321 |
nauc_mrr_at_1_std | -18.036424092186657 |
nauc_mrr_at_20_diff1 | 8.684842497970315 |
nauc_mrr_at_20_max | -10.691578914627286 |
nauc_mrr_at_20_std | -17.84350301434992 |
nauc_mrr_at_3_diff1 | 8.649761557556763 |
nauc_mrr_at_3_max | -11.104694428047496 |
nauc_mrr_at_3_std | -18.149917948370344 |
nauc_mrr_at_5_diff1 | 8.433489750038396 |
nauc_mrr_at_5_max | -10.917772454397436 |
nauc_mrr_at_5_std | -18.4094211134111 |
nauc_ndcg_at_1000_diff1 | 10.19041067807956 |
nauc_ndcg_at_1000_max | -9.54328201605796 |
nauc_ndcg_at_1000_std | -17.824620427456633 |
nauc_ndcg_at_100_diff1 | 10.289491087585963 |
nauc_ndcg_at_100_max | -9.357214331420337 |
nauc_ndcg_at_100_std | -17.657600653632873 |
nauc_ndcg_at_10_diff1 | 9.435530877596092 |
nauc_ndcg_at_10_max | -8.182581635383546 |
nauc_ndcg_at_10_std | -17.603156479980388 |
nauc_ndcg_at_1_diff1 | 14.549204048461151 |
nauc_ndcg_at_1_max | -12.230560087701225 |
nauc_ndcg_at_1_std | -19.469903650130362 |
nauc_ndcg_at_20_diff1 | 9.885227087275197 |
nauc_ndcg_at_20_max | -8.52362662391439 |
nauc_ndcg_at_20_std | -17.441705436231764 |
nauc_ndcg_at_3_diff1 | 9.22542769998547 |
nauc_ndcg_at_3_max | -9.903590564219288 |
nauc_ndcg_at_3_std | -18.357220221111593 |
nauc_ndcg_at_5_diff1 | 8.8756720745828 |
nauc_ndcg_at_5_max | -9.269764943861245 |
nauc_ndcg_at_5_std | -19.009229433187784 |
nauc_precision_at_1000_diff1 | 3.733355117431035 |
nauc_precision_at_1000_max | 3.9603571352517393 |
nauc_precision_at_1000_std | 70.07345061131439 |
nauc_precision_at_100_diff1 | 29.019032142462457 |
nauc_precision_at_100_max | 40.75153328286103 |
nauc_precision_at_100_std | 62.634249549126594 |
nauc_precision_at_10_diff1 | 2.5762677254910353 |
nauc_precision_at_10_max | 6.096298633773051 |
nauc_precision_at_10_std | -11.507400451348587 |
nauc_precision_at_1_diff1 | 14.549204048461151 |
nauc_precision_at_1_max | -12.230560087701225 |
nauc_precision_at_1_std | -19.469903650130362 |
nauc_precision_at_20_diff1 | 1.715540124567996 |
nauc_precision_at_20_max | 21.53546453945913 |
nauc_precision_at_20_std | 1.537961142195571 |
nauc_precision_at_3_diff1 | 5.701850652555737 |
nauc_precision_at_3_max | -8.180345365085552 |
nauc_precision_at_3_std | -18.37033750502482 |
nauc_precision_at_5_diff1 | 3.6053552181042843 |
nauc_precision_at_5_max | -5.207647070615612 |
nauc_precision_at_5_std | -19.89491085427258 |
nauc_recall_at_1000_diff1 | 3.733355117431255 |
nauc_recall_at_1000_max | 3.9603571352482194 |
nauc_recall_at_1000_std | 70.07345061131205 |
nauc_recall_at_100_diff1 | 29.01903214246288 |
nauc_recall_at_100_max | 40.7515332828621 |
nauc_recall_at_100_std | 62.63424954912607 |
nauc_recall_at_10_diff1 | 2.5762677254911988 |
nauc_recall_at_10_max | 6.0962986337729905 |
nauc_recall_at_10_std | -11.507400451348577 |
nauc_recall_at_1_diff1 | 14.549204048461151 |
nauc_recall_at_1_max | -12.230560087701225 |
nauc_recall_at_1_std | -19.469903650130362 |
nauc_recall_at_20_diff1 | 1.7155401245682675 |
nauc_recall_at_20_max | 21.535464539459632 |
nauc_recall_at_20_std | 1.5379611421957025 |
nauc_recall_at_3_diff1 | 5.7018506525557875 |
nauc_recall_at_3_max | -8.180345365085538 |
nauc_recall_at_3_std | -18.370337505024796 |
nauc_recall_at_5_diff1 | 3.6053552181043913 |
nauc_recall_at_5_max | -5.207647070615579 |
nauc_recall_at_5_std | -19.894910854272492 |
ndcg_at_1 | 34.282000000000004 |
ndcg_at_10 | 59.53000000000001 |
ndcg_at_100 | 62.187000000000005 |
ndcg_at_1000 | 62.243 |
ndcg_at_20 | 61.451 |
ndcg_at_3 | 49.393 |
ndcg_at_5 | 54.771 |
precision_at_1 | 34.282000000000004 |
precision_at_10 | 8.791 |
precision_at_100 | 0.992 |
precision_at_1000 | 0.1 |
precision_at_20 | 4.769 |
precision_at_3 | 20.104 |
precision_at_5 | 14.651 |
recall_at_1 | 34.282000000000004 |
recall_at_10 | 87.909 |
recall_at_100 | 99.21799999999999 |
recall_at_1000 | 99.644 |
recall_at_20 | 95.377 |
recall_at_3 | 60.313 |
recall_at_5 | 73.257 |
MTEB CQADupstackAndroidRetrieval 数据集
指标类型 | 值 |
---|---|
main_score | 53.885000000000005 |
map_at_1 | 35.429 |
map_at_10 | 47.469 |
map_at_100 | 48.997 |
map_at_1000 | 49.117 |
map_at_20 | 48.324 |
map_at_3 | 43.835 |
map_at_5 | 46.043 |
mrr_at_1 | 43.34763948497854 |
mrr_at_10 | 53.258623430297234 |
mrr_at_100 | 53.99123884299005 |
mrr_at_1000 | 54.02458101713216 |
mrr_at_20 | 53.695964669618945 |
mrr_at_3 | 50.81068192656173 |
mrr_at_5 | 52.45588936576058 |
nauc_map_at_1000_diff1 | 51.55382824218782 |
nauc_map_at_1000_max | 31.855350695084606 |
nauc_map_at_1000_std | -5.465862008150992 |
nauc_map_at_100_diff1 | 51.55889312452534 |
nauc_map_at_100_max | 31.88429637207401 |
nauc_map_at_100_std | -5.40805152544196 |
nauc_map_at_10_diff1 | 51.6592677505875 |
nauc_map_at_10_max | 31.554425233617543 |
nauc_map_at_10_std | -6.125756131339046 |
nauc_map_at_1_diff1 | 55.6889617582672 |
nauc_map_at_1_max | 27.821166966868176 |
nauc_map_at_1_std | -5.778838498211728 |
nauc_map_at_20_diff1 | 51.70520970992564 |
nauc_map_at_20_max | 31.811676633900465 |
nauc_map_at_20_std | -5.463596751904718 |
nauc_map_at_3_diff1 | 53.206169626589606 |
nauc_map_at_3_max | 31.64373830824983 |
nauc_map_at_3_std | -6.054761451312827 |
nauc_map_at_5_diff1 | 52.37308971673694 |
nauc_map_at_5_max | 31.974302019633644 |
nauc_map_at_5_std | -6.302653399940531 |
nauc_mrr_at_1000_diff1 | 49.345152231490616 |
nauc_mrr_at_1000_max | 33.49789501712511 |
nauc_mrr_at_1000_std | -6.054730861163538 |
nauc_mrr_at_100_diff1 | 49.3387577601307 |
nauc_mrr_at_100_max | 33.48149992464187 |
nauc_mrr_at_100_std | -6.061177137579308 |
nauc_mrr_at_10_diff1 | 49.08312288449718 |
nauc_mrr_at_10_max | 33.470393322577465 |
nauc_mrr_at_10_std | -6.180286430216975 |
nauc_mrr_at_1_diff1 | 52.43364978537192 |
nauc_mrr_at_1_max | 31.521755633355713 |
nauc_mrr_at_1_std | -7.002499524130836 |
nauc_mrr_at_20_diff1 | 49.311059224991766 |
nauc_mrr_at_20_max | 33.538523037692144 |
nauc_mrr_at_20_std | -6.034619474981136 |
nauc_mrr_at_3_diff1 | 49.90489868439366 |
nauc_mrr_at_3_max | 34.400493912164606 |
nauc_mrr_at_3_std | -6.028875320994629 |
nauc_mrr_at_5_diff1 | 49.033661898983475 |
nauc_mrr_at_5_max | 33.732315350193936 |
nauc_mrr_at_5_std | -6.272548556330368 |
nauc_ndcg_at_1000_diff1 | 49.81681892539247 |
nauc_ndcg_at_1000_max | 33.06518006062093 |
nauc_ndcg_at_1000_std | -4.282105713014755 |
nauc_ndcg_at_100_diff1 | 49.42362108857786 |
nauc_ndcg_at_100_max | 32.92024325540483 |
nauc_ndcg_at_100_std | -3.7786765305496717 |
nauc_ndcg_at_10_diff1 | 48.83102435475594 |
nauc_ndcg_at_10_max | 31.898404563611958 |
nauc_ndcg_at_10_std | -6.2024003866707 |
nauc_ndcg_at_1_diff1 | 52.43364978537192 |
nauc_ndcg_at_1_max | 31.521755633355713 |
nauc_ndcg_at_1_std | -7.002499524130836 |
nauc_ndcg_at_20_diff1 | 49.466526454438316 |
nauc_ndcg_at_20_max | 32.424462698701674 |
nauc_ndcg_at_20_std | -4.520809563712905 |
nauc_ndcg_at_3_diff1 | 50.997884562583884 |
nauc_ndcg_at_3_max | 33.26787046916917 |
nauc_ndcg_at_3_std | -6.340699471083753 |
nauc_ndcg_at_5_diff1 | 49.68314458398097 |
nauc_ndcg_at_5_max | 32.80910071143984 |
nauc_ndcg_at_5_std | -6.734495576445887 |
nauc_precision_at_1000_diff1 | -24.18940012795299 |
nauc_precision_at_1000_max | -10.995343674356896 |
nauc_precision_at_1000_std | -8.298841004724856 |
nauc_precision_at_100_diff1 | -18.104939577865935 |
nauc_precision_at_100_max | -1.3757613100627637 |
nauc_precision_at_100_std | 0.07661922190466432 |
nauc_precision_at_10_diff1 | 3.9624459059275967 |
nauc_precision_at_10_max | 14.841561593450391 |
nauc_precision_at_10_std | -2.485374333613117 |
nauc_precision_at_1_diff1 | 52.43364978537192 |
nauc_precision_at_1_max | 31.521755633355713 |
nauc_precision_at_1_std | -7.002499524130836 |
nauc_precision_at_20_diff1 | -4.4791763436505265 |
nauc_precision_at_20_max | 9.157872836996276 |
nauc_precision_at_20_std | 2.086903518342088 |
nauc_precision_at_3_diff1 | 28.480888018235568 |
nauc_precision_at_3_max | 30.34526267718485 |
nauc_precision_at_3_std | -6.3006706923866025 |
nauc_precision_at_5_diff1 | 16.488039195453517 |
nauc_precision_at_5_max | 24.593477099241852 |
nauc_precision_at_5_std | -5.316448107840636 |
nauc_recall_at_1000_diff1 | 34.715187316533076 |
nauc_recall_at_1000_max | 58.2266544684947 |
nauc_recall_at_1000_std | 63.85237636398278 |
nauc_recall_at_100_diff1 | 36.08623826028132 |
nauc_recall_at_100_max | 33.05011429439473 |
nauc_recall_at_100_std | 16.559545021212564 |
nauc_recall_at_10_diff1 | 39.76738610714205 |
nauc_recall_at_10_max | 28.233045706945997 |
nauc_recall_at_10_std | -5.13243784043598 |
nauc_recall_at_1_diff1 | 55.6889617582672 |
nauc_recall_at_1_max | 27.821166966868176 |
nauc_recall_at_1_std | -5.778838498211728 |
nauc_recall_at_20_diff1 | 41.18682480073759 |
nauc_recall_at_20_max | 29.525993239296945 |
nauc_recall_at_20_std | 1.5003598438954298 |
nauc_recall_at_3_diff1 | 48.31879460301157 |
nauc_recall_at_3_max | 32.93751306970167 |
nauc_recall_at_3_std | -5.28070084211707 |
nauc_recall_at_5_diff1 | 44.327686388315435 |
nauc_recall_at_5_max | 32.04823486234599 |
nauc_recall_at_5_std | -6.4221525602778256 |
ndcg_at_1 | 43.348 |
ndcg_at_10 | 53.885000000000005 |
ndcg_at_100 | 59.204 |
ndcg_at_1000 | 60.744 |
ndcg_at_20 | 55.995 |
ndcg_at_3 | 49.112 |
ndcg_at_5 | 51.61900000000001 |
precision_at_1 | 43.348 |
precision_at_10 | 10.242999999999999 |
precision_at_100 | 1.6150000000000002 |
precision_at_1000 | 0.203 |
precision_at_20 | 6.066 |
precision_at_3 | 23.605 |
precision_at_5 | 17.024 |
recall_at_1 | 35.429 |
recall_at_10 | 65.77199999999999 |
recall_at_100 | 87.89 |
recall_at_1000 | 97.13000000000001 |
recall_at_20 | 73.299 |
recall_at_3 | 52.034000000000006 |
recall_at_5 | 58.96 |
MTEB CQADupstackEnglishRetrieval 数据集
指标类型 | 值 |
---|---|
main_score | 49.55 |
map_at_1 | 31.684 |
map_at_10 | 43.258 |
map_at_100 | 44.628 |
map_at_1000 | 44.761 |
map_at_20 | 44.015 |
map_at_3 | 39.778000000000006 |
map_at_5 | 41.643 |
mrr_at_1 | 39.87261146496815 |
mrr_at_10 | 49.31978566373469 |
mrr_at_100 | 49.94922739445482 |
mrr_at_1000 | 49.990325601254106 |
mrr_at_20 | 49.70597468576704 |
mrr_at_3 | 47.070063694267546 |
mrr_at_5 | 48.23248407643316 |
nauc_map_at_1000_diff1 | 53.44044712371752 |
nauc_map_at_1000_max | 34.5651440062204 |
nauc_map_at_1000_std | -0.9814384609230475 |
nauc_map_at_100_diff1 | 53.429004435388464 |
nauc_map_at_100_max | 34.52038957273436 |
nauc_map_at_100_std | -1.1021936362699805 |
nauc_map_at_10_diff1 | 53.879128574022005 |
nauc_map_at_10_max | 33.74771524140917 |
nauc_map_at_10_std | -2.945132777205236 |
nauc_map_at_1_diff1 | 60.25159799695403 |
nauc_map_at_1_max | 26.843892985235808 |
nauc_map_at_1_std | -9.618702739509093 |
nauc_map_at_20_diff1 | 53.56789898225283 |
nauc_map_at_20_max | 34.11628845872402 |
nauc_map_at_20_std | -2.024376635870884 |
nauc_map_at_3_diff1 | 54.45882099014072 |
nauc_map_at_3_max | 31.29495446507793 |
nauc_map_at_3_std | -6.3 |
(其他指标值原文档未完整给出) |
模型信息表格
属性 | 详情 |
---|---|
模型类型 | 用于句子相似度和检索任务的模型 |
训练数据 | 未提及 |
许可证
本模型采用 apache-2.0
许可证。
Jina Embeddings V3
Jina Embeddings V3 是一个多语言句子嵌入模型,支持超过100种语言,专注于句子相似度和特征提取任务。
文本嵌入
Transformers 支持多种语言

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
基于MS Marco段落排序任务训练的交叉编码器模型,用于信息检索中的查询-段落相关性评分
文本嵌入 英语
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
基于蒸馏技术的稀疏检索模型,专为OpenSearch优化,支持免推理文档编码,在搜索相关性和效率上优于V1版本
文本嵌入
Transformers 英语

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
基于PubMedBERT的生物医学实体表征模型,通过自对齐预训练优化语义关系捕捉
文本嵌入 英语
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large 是一个强大的句子转换器模型,专注于句子相似度和文本嵌入任务,在多个基准测试中表现出色。
文本嵌入 英语
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 是一个英文句子转换器模型,专注于句子相似度任务,在多个文本嵌入基准测试中表现优异。
文本嵌入
Transformers 支持多种语言

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base 是一个多语言的句子嵌入模型,支持超过50种语言,适用于句子相似度计算等任务。
文本嵌入
Transformers 支持多种语言

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT是一个化学语言模型,旨在实现完全由机器驱动的超快聚合物信息学。它将PSMILES字符串映射为600维密集指纹,以数值形式表示聚合物化学结构。
文本嵌入
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
基于土耳其语BERT的句子嵌入模型,专为语义相似度任务优化
文本嵌入
Transformers 其他

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
基于BAAI/bge-small-en-v1.5模型微调的文本嵌入模型,通过MEDI数据集与MTEB分类任务数据集训练,优化了检索任务的查询编码能力。
文本嵌入
Safetensors 英语
G
avsolatorio
945.68k
29
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98