PEG
P
PEG
由 TownsWu 开发
PEG是一种通过渐进式学习实现稳健文本检索的模型,基于负样本的难度级别调整损失权重。
下载量 36
发布时间 : 4/25/2025
模型简介
PEG模型(渐进式学习的文本嵌入)基于负样本的难度级别,在极大批次中逐步调整对损失有贡献的样本权重。适用于广泛领域的文本检索任务。
模型特点
渐进式学习
基于负样本的难度级别逐步调整损失权重,提升模型性能。
大规模训练数据
使用超过1.1亿条涵盖广泛领域的数据集进行训练。
稳健文本检索
在多种领域(如通用知识、金融、旅游、医学等)表现优异。
模型能力
文本嵌入生成
句子相似度计算
文本检索
问答系统支持
使用案例
医疗领域
医疗问答检索
用于医疗问答系统中的相关问题检索
在CMedQAv1和CMedQAv2数据集上表现优异
通用检索
通用文本检索
适用于各种领域的文本检索任务
在多个检索任务中表现良好
🚀 PEG:基于渐进学习的鲁棒文本检索模型
PEG模型(渐进学习文本嵌入模型)能够根据负样本的难度级别,在超大批次内逐步调整对损失有贡献的样本权重。该模型拥有超过1.1亿条数据,涵盖了常识、金融、旅游、医学等多个领域。
🚀 快速开始
安装依赖
pip install transformers
加载模型并进行预测
from transformers import AutoModel, AutoTokenizer
import torch
# 从HuggingFace Hub加载模型
tokenizer = AutoTokenizer.from_pretrained('TownsWu/PEG')
model = AutoModel.from_pretrained('TownsWu/PEG')
sentences = ['如何更换花呗绑定银行卡', '花呗更改绑定银行卡']
# 对句子进行分词
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# 计算词嵌入
with torch.no_grad():
last_hidden_state = model(**inputs, return_dict=True).last_hidden_state
embeddings = last_hidden_state[:, 0]
print("embeddings:")
print(embeddings)
✨ 主要特性
- 渐进学习机制:PEG模型根据负样本的难度级别,在超大批次内逐步调整对损失有贡献的样本权重,从而实现更高效的学习。
- 多领域数据支持:模型拥有超过1.1亿条数据,涵盖了常识、金融、旅游、医学等多个领域,具有广泛的适用性。
📦 安装指南
安装transformers
库:
pip install transformers
💻 使用示例
基础用法
from transformers import AutoModel, AutoTokenizer
import torch
# 从HuggingFace Hub加载模型
tokenizer = AutoTokenizer.from_pretrained('TownsWu/PEG')
model = AutoModel.from_pretrained('TownsWu/PEG')
sentences = ['如何更换花呗绑定银行卡', '花呗更改绑定银行卡']
# 对句子进行分词
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# 计算词嵌入
with torch.no_grad():
last_hidden_state = model(**inputs, return_dict=True).last_hidden_state
embeddings = last_hidden_state[:, 0]
print("embeddings:")
print(embeddings)
📚 详细文档
模型详情
我们提出了PEG模型(渐进学习文本嵌入模型),该模型根据负样本的难度级别,在超大批次内逐步调整对损失有贡献的样本权重。我们收集了超过1.1亿条数据,涵盖了常识、金融、旅游、医学等多个领域。
我们的技术报告可在论文链接查看。
联系我们
如果您对本项目有任何问题或建议,请随时提交issue或pull request。您也可以通过电子邮件联系Tong Wu(townswu@tencent.com)。
引用信息
如果您认为我们的工作对您的研究有帮助,请考虑引用以下BibTeX条目:
@article{wu2023towards,
title={Towards Robust Text Retrieval with Progressive Learning},
author={Wu, Tong and Qin, Yulei and Zhang, Enwei and Xu, Zihan and Gao, Yuting and Li, Ke and Sun, Xing},
journal={arXiv preprint arXiv:2311.11691},
year={2023}
}
🔧 技术细节
PEG模型基于渐进学习的思想,在训练过程中根据负样本的难度级别动态调整样本权重。通过这种方式,模型能够更好地处理复杂的文本检索任务,提高检索的准确性和鲁棒性。
📄 许可证
本项目采用Apache-2.0许可证。
📊 模型指标
任务类型 | 数据集 | 指标类型 | 指标值 |
---|---|---|---|
Reranking | C-MTEB/CMedQAv1-reranking | map | 84.09137463267582 |
Reranking | C-MTEB/CMedQAv1-reranking | mrr | 86.6288888888889 |
Reranking | C-MTEB/CMedQAv2-reranking | map | 86.55765031914974 |
Reranking | C-MTEB/CMedQAv2-reranking | mrr | 89.4325396825397 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_1 | 26.101000000000003 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_10 | 38.239000000000004 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_100 | 40.083 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_1000 | 40.205 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_3 | 34.386 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_5 | 36.425999999999995 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_1 | 39.434999999999995 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_10 | 46.967999999999996 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_100 | 47.946 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_1000 | 47.997 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_3 | 44.803 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_5 | 45.911 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_1 | 39.434999999999995 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_10 | 44.416 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_100 | 51.773 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_1000 | 53.888000000000005 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_3 | 39.816 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_5 | 41.467999999999996 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_1 | 39.434999999999995 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_10 | 9.786999999999999 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_100 | 1.5810000000000002 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_1000 | 0.184 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_3 | 22.414 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_5 | 15.943999999999999 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_1 | 26.101000000000003 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_10 | 53.82900000000001 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_100 | 84.63199999999999 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_1000 | 98.782 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_3 | 39.585 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_5 | 45.141 |
Retrieval | C_MTEB/CovidRetrieval | map_at_1 | 70.39 |
Retrieval | C_MTEB/CovidRetrieval | map_at_10 | 78.93599999999999 |
Retrieval | C_MTEB/CovidRetrieval | map_at_100 | 79.202 |
Retrieval | C_MTEB/CovidRetrieval | map_at_1000 | 79.205 |
Retrieval | C_MTEB/CovidRetrieval | map_at_3 | 77.538 |
Retrieval | C_MTEB/CovidRetrieval | map_at_5 | 78.312 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_1 | 70.706 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_10 | 79.018 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_100 | 79.28399999999999 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_1000 | 79.288 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_3 | 77.713 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_5 | 78.462 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_1 | 70.601 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_10 | 82.555 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_100 | 83.718 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_1000 | 83.855 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_3 | 79.779 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_5 | 81.149 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_1 | 70.601 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_10 | 9.463000000000001 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_100 | 0.9979999999999999 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_1000 | 0.101 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_3 | 28.871999999999996 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_5 | 18.019 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_1 | 70.39 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_10 | 93.572 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_100 | 98.736 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_1000 | 99.895 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_3 | 86.091 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_5 | 89.384 |
Retrieval | C_MTEB/DuRetrieval | map_at_1 | 26.147 |
Retrieval | C_MTEB/DuRetrieval | map_at_10 | 80.205 |
Retrieval | C_MTEB/DuRetrieval | map_at_100 | 82.96 |
Retrieval | C_MTEB/DuRetrieval | map_at_1000 | 82.999 |
Retrieval | C_MTEB/DuRetrieval | map_at_3 | 55.16799999999999 |
Retrieval | C_MTEB/DuRetrieval | map_at_5 | 69.798 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_1 | 89.8 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_10 | 93.16799999999999 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_100 | 93.22500000000001 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_1000 | 93.228 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_3 | 92.85 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_5 | 93.067 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_1 | 89.8 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_10 | 87.668 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_100 | 90.16 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_1000 | 90.505 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_3 | 85.842 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_5 | 85.101 |
Retrieval | C_MTEB/DuRetrieval | precision_at_1 | 89.8 |
Retrieval | C_MTEB/DuRetrieval | precision_at_10 | 42.225 |
Retrieval | C_MTEB/DuRetrieval | precision_at_100 | 4.8149999999999995 |
Retrieval | C_MTEB/DuRetrieval | precision_at_1000 | 0.48900000000000005 |
Retrieval | C_MTEB/DuRetrieval | precision_at_3 | 76.967 |
Retrieval | C_MTEB/DuRetrieval | precision_at_5 | 65.32 |
Retrieval | C_MTEB/DuRetrieval | recall_at_1 | 26.147 |
Retrieval | C_MTEB/DuRetrieval | recall_at_10 | 89.30399999999999 |
Retrieval | C_MTEB/DuRetrieval | recall_at_100 | 97.609 |
Retrieval | C_MTEB/DuRetrieval | recall_at_1000 | 99.409 |
Retrieval | C_MTEB/DuRetrieval | recall_at_3 | 57.56 |
Retrieval | C_MTEB/DuRetrieval | recall_at_5 | 74.78200000000001 |
Retrieval | C_MTEB/EcomRetrieval | map_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | map_at_10 | 62.507000000000005 |
Retrieval | C_MTEB/EcomRetrieval | map_at_100 | 63.068000000000005 |
Retrieval | C_MTEB/EcomRetrieval | map_at_1000 | 63.08200000000001 |
Retrieval | C_MTEB/EcomRetrieval | map_at_3 | 60.050000000000004 |
Retrieval | C_MTEB/EcomRetrieval | map_at_5 | 61.41 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_10 | 62.507000000000005 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_100 | 63.068000000000005 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_1000 | 63.08200000000001 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_3 | 60.050000000000004 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_5 | 61.41 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_10 | 67.31700000000001 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_100 | 69.862 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_1000 | 70.231 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_3 | 62.222 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_5 | 64.66300000000001 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_10 | 8.260000000000002 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_100 | 0.941 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_1000 | 0.097 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_3 | 22.833000000000002 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_5 | 14.879999999999999 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_10 | 82.6 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_100 | 94.1 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_1000 | 97.0 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_3 | 68.5 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_5 | 74.4 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_1 | 70.68799999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_10 | 79.28399999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_100 | 79.537 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_1000 | 79.545 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_3 | 77.643 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_5 | 78.694 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_1 | 73.05199999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_10 | 79.794 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_100 | 80.024 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_1000 | 80.03099999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_3 | 78.441 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_5 | 79.29 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_1 | 73.05199999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_10 | 82.627 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_100 | 83.737 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_1000 | 83.946 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_3 | 79.585 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_5 | 81.306 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_1 | 73.05199999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_10 | 9.835 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_100 | 1.038 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_1000 | 0.106 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_3 | 29.756 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_5 | 18.788 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_1 | 70.68799999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_10 | 92.38300000000001 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_100 | 97.347 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_1000 | 98.992 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_3 | 84.37 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_5 | 88.434 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_1 | 53.1 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_10 | 58.36599999999999 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_100 | 58.939 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_1000 | 58.99100000000001 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_3 | 57.15 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_5 | 57.794999999999995 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_1 | 53.2 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_10 | 58.416000000000004 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_100 | 58.989999999999995 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_1000 | 59.041 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_3 | 57.199999999999996 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_5 | 57.845 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_1 | 53.1 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_10 | 60.989000000000004 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_100 | 63.967 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_1000 | 65.436 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_3 | 58.425000000000004 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_5 | 59.583 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_1 | 53.1 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_10 | 6.93 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_100 | 0.8370000000000001 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_1000 | 0.096 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_3 | 20.7 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_5 | 12.98 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_1 | 53.1 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_10 | 69.3 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_100 | 83.7 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_1000 | 95.5 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_3 | 62.1 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_5 | 64.9 |
Reranking | C-MTEB/Mmarco-reranking | map | 33.548800108363665 |
Reranking | C-MTEB/Mmarco-reranking | mrr | 32.529761904761905 |
Reranking | C-MTEB/T2Reranking | map | 69.43381583724414 |
Reranking | C-MTEB/T2Reranking | mrr | 80.47879657392181 |
Retrieval | C_MTEB/T2Retrieval | map_at_1 | 28.116000000000003 |
Retrieval | C_MTEB/T2Retrieval | map_at_10 | 80.026 |
Retrieval | C_MTEB/T2Retrieval | map_at_100 | 83.541 |
Retrieval | C_MTEB/T2Retrieval | map_at_1000 | 83.592 |
Retrieval | C_MTEB/T2Retrieval | map_at_3 | 56.092 |
Retrieval | C_MTEB/T2Retrieval | map_at_5 | 69.114 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_1 | 91.557 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_10 | 93.73700000000001 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_100 | 93.808 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_1000 | 93.811 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_3 | 93.384 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_5 | 93.614 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_1 | 91.553 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_10 | 87.003 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_100 | 90.128 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_1000 | 90.615 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_3 | 88.205 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_5 | 86.978 |
Retrieval | C_MTEB/T2Retrieval | precision_at_1 | 91.553 |
Retrieval | C_MTEB/T2Retrieval | precision_at_10 | 43.25 |
Retrieval | C_MTEB/T2Retrieval | precision_at_100 | 5.067 |
Retrieval | C_MTEB/T2Retrieval | precision_at_1000 | 0.518 |
Retrieval | C_MTEB/T2Retrieval | precision_at_3 | 77.25 |
Retrieval | C_MTEB/T2Retrieval | precision_at_5 | 64.902 |
Retrieval | C_MTEB/T2Retrieval | recall_at_1 | 28.116000000000003 |
Retrieval | C_MTEB/T2Retrieval | recall_at_10 | 85.994 |
Retrieval | C_MTEB/T2Retrieval | recall_at_100 | 96.345 |
Retrieval | C_MTEB/T2Retrieval | recall_at_1000 | 98.867 |
Retrieval | C_MTEB/T2Retrieval | recall_at_3 | 57.67099999999999 |
Retrieval | C_MTEB/T2Retrieval | recall_at_5 | 72.26 |
Retrieval | C_MTEB/VideoRetrieval | map_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | map_at_10 | 73.763 |
Retrieval | C_MTEB/VideoRetrieval | map_at_100 | 74.116 |
Retrieval | C_MTEB/VideoRetrieval | map_at_1000 | 74.12100000000001 |
Retrieval | C_MTEB/VideoRetrieval | map_at_3 | 72.15 |
Retrieval | C_MTEB/VideoRetrieval | map_at_5 | 73.25 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_10 | 73.763 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_100 | 74.116 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_1000 | 74.12100000000001 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_3 | 72.15 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_5 | 73.25 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_10 | 77.639 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_100 | 79.396 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_1000 | 79.554 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_3 | 74.406 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_5 | 76.385 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_10 | 8.959999999999999 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_100 | 0.979 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_1000 | 0.099 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_3 | 26.967000000000002 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_5 | 17.14 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_10 | 89.60000000000001 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_100 | 97.89999999999999 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_1000 | 99.2 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_3 | 80.9 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_5 | 85.7 |
Jina Embeddings V3
Jina Embeddings V3 是一个多语言句子嵌入模型,支持超过100种语言,专注于句子相似度和特征提取任务。
文本嵌入
Transformers 支持多种语言

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
基于MS Marco段落排序任务训练的交叉编码器模型,用于信息检索中的查询-段落相关性评分
文本嵌入 英语
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
基于蒸馏技术的稀疏检索模型,专为OpenSearch优化,支持免推理文档编码,在搜索相关性和效率上优于V1版本
文本嵌入
Transformers 英语

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
基于PubMedBERT的生物医学实体表征模型,通过自对齐预训练优化语义关系捕捉
文本嵌入 英语
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large 是一个强大的句子转换器模型,专注于句子相似度和文本嵌入任务,在多个基准测试中表现出色。
文本嵌入 英语
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 是一个英文句子转换器模型,专注于句子相似度任务,在多个文本嵌入基准测试中表现优异。
文本嵌入
Transformers 支持多种语言

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base 是一个多语言的句子嵌入模型,支持超过50种语言,适用于句子相似度计算等任务。
文本嵌入
Transformers 支持多种语言

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT是一个化学语言模型,旨在实现完全由机器驱动的超快聚合物信息学。它将PSMILES字符串映射为600维密集指纹,以数值形式表示聚合物化学结构。
文本嵌入
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
基于土耳其语BERT的句子嵌入模型,专为语义相似度任务优化
文本嵌入
Transformers 其他

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
基于BAAI/bge-small-en-v1.5模型微调的文本嵌入模型,通过MEDI数据集与MTEB分类任务数据集训练,优化了检索任务的查询编码能力。
文本嵌入
Safetensors 英语
G
avsolatorio
945.68k
29
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98