PEG
P
PEG
由TownsWu開發
PEG是一種通過漸進式學習實現穩健文本檢索的模型,基於負樣本的難度級別調整損失權重。
下載量 36
發布時間 : 4/25/2025
模型概述
PEG模型(漸進式學習的文本嵌入)基於負樣本的難度級別,在極大批次中逐步調整對損失有貢獻的樣本權重。適用於廣泛領域的文本檢索任務。
模型特點
漸進式學習
基於負樣本的難度級別逐步調整損失權重,提升模型性能。
大規模訓練數據
使用超過1.1億條涵蓋廣泛領域的數據集進行訓練。
穩健文本檢索
在多種領域(如通用知識、金融、旅遊、醫學等)表現優異。
模型能力
文本嵌入生成
句子相似度計算
文本檢索
問答系統支持
使用案例
醫療領域
醫療問答檢索
用於醫療問答系統中的相關問題檢索
在CMedQAv1和CMedQAv2數據集上表現優異
通用檢索
通用文本檢索
適用於各種領域的文本檢索任務
在多個檢索任務中表現良好
🚀 PEG:基於漸進學習的魯棒文本檢索模型
PEG模型(漸進學習文本嵌入模型)能夠根據負樣本的難度級別,在超大批次內逐步調整對損失有貢獻的樣本權重。該模型擁有超過1.1億條數據,涵蓋了常識、金融、旅遊、醫學等多個領域。
🚀 快速開始
安裝依賴
pip install transformers
加載模型並進行預測
from transformers import AutoModel, AutoTokenizer
import torch
# 從HuggingFace Hub加載模型
tokenizer = AutoTokenizer.from_pretrained('TownsWu/PEG')
model = AutoModel.from_pretrained('TownsWu/PEG')
sentences = ['如何更換花唄綁定銀行卡', '花唄更改綁定銀行卡']
# 對句子進行分詞
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# 計算詞嵌入
with torch.no_grad():
last_hidden_state = model(**inputs, return_dict=True).last_hidden_state
embeddings = last_hidden_state[:, 0]
print("embeddings:")
print(embeddings)
✨ 主要特性
- 漸進學習機制:PEG模型根據負樣本的難度級別,在超大批次內逐步調整對損失有貢獻的樣本權重,從而實現更高效的學習。
- 多領域數據支持:模型擁有超過1.1億條數據,涵蓋了常識、金融、旅遊、醫學等多個領域,具有廣泛的適用性。
📦 安裝指南
安裝transformers
庫:
pip install transformers
💻 使用示例
基礎用法
from transformers import AutoModel, AutoTokenizer
import torch
# 從HuggingFace Hub加載模型
tokenizer = AutoTokenizer.from_pretrained('TownsWu/PEG')
model = AutoModel.from_pretrained('TownsWu/PEG')
sentences = ['如何更換花唄綁定銀行卡', '花唄更改綁定銀行卡']
# 對句子進行分詞
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# 計算詞嵌入
with torch.no_grad():
last_hidden_state = model(**inputs, return_dict=True).last_hidden_state
embeddings = last_hidden_state[:, 0]
print("embeddings:")
print(embeddings)
📚 詳細文檔
模型詳情
我們提出了PEG模型(漸進學習文本嵌入模型),該模型根據負樣本的難度級別,在超大批次內逐步調整對損失有貢獻的樣本權重。我們收集了超過1.1億條數據,涵蓋了常識、金融、旅遊、醫學等多個領域。
我們的技術報告可在論文鏈接查看。
聯繫我們
如果您對本項目有任何問題或建議,請隨時提交issue或pull request。您也可以通過電子郵件聯繫Tong Wu(townswu@tencent.com)。
引用信息
如果您認為我們的工作對您的研究有幫助,請考慮引用以下BibTeX條目:
@article{wu2023towards,
title={Towards Robust Text Retrieval with Progressive Learning},
author={Wu, Tong and Qin, Yulei and Zhang, Enwei and Xu, Zihan and Gao, Yuting and Li, Ke and Sun, Xing},
journal={arXiv preprint arXiv:2311.11691},
year={2023}
}
🔧 技術細節
PEG模型基於漸進學習的思想,在訓練過程中根據負樣本的難度級別動態調整樣本權重。通過這種方式,模型能夠更好地處理複雜的文本檢索任務,提高檢索的準確性和魯棒性。
📄 許可證
本項目採用Apache-2.0許可證。
📊 模型指標
任務類型 | 數據集 | 指標類型 | 指標值 |
---|---|---|---|
Reranking | C-MTEB/CMedQAv1-reranking | map | 84.09137463267582 |
Reranking | C-MTEB/CMedQAv1-reranking | mrr | 86.6288888888889 |
Reranking | C-MTEB/CMedQAv2-reranking | map | 86.55765031914974 |
Reranking | C-MTEB/CMedQAv2-reranking | mrr | 89.4325396825397 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_1 | 26.101000000000003 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_10 | 38.239000000000004 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_100 | 40.083 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_1000 | 40.205 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_3 | 34.386 |
Retrieval | C_MTEB/CmedqaRetrieval | map_at_5 | 36.425999999999995 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_1 | 39.434999999999995 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_10 | 46.967999999999996 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_100 | 47.946 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_1000 | 47.997 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_3 | 44.803 |
Retrieval | C_MTEB/CmedqaRetrieval | mrr_at_5 | 45.911 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_1 | 39.434999999999995 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_10 | 44.416 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_100 | 51.773 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_1000 | 53.888000000000005 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_3 | 39.816 |
Retrieval | C_MTEB/CmedqaRetrieval | ndcg_at_5 | 41.467999999999996 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_1 | 39.434999999999995 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_10 | 9.786999999999999 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_100 | 1.5810000000000002 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_1000 | 0.184 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_3 | 22.414 |
Retrieval | C_MTEB/CmedqaRetrieval | precision_at_5 | 15.943999999999999 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_1 | 26.101000000000003 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_10 | 53.82900000000001 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_100 | 84.63199999999999 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_1000 | 98.782 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_3 | 39.585 |
Retrieval | C_MTEB/CmedqaRetrieval | recall_at_5 | 45.141 |
Retrieval | C_MTEB/CovidRetrieval | map_at_1 | 70.39 |
Retrieval | C_MTEB/CovidRetrieval | map_at_10 | 78.93599999999999 |
Retrieval | C_MTEB/CovidRetrieval | map_at_100 | 79.202 |
Retrieval | C_MTEB/CovidRetrieval | map_at_1000 | 79.205 |
Retrieval | C_MTEB/CovidRetrieval | map_at_3 | 77.538 |
Retrieval | C_MTEB/CovidRetrieval | map_at_5 | 78.312 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_1 | 70.706 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_10 | 79.018 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_100 | 79.28399999999999 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_1000 | 79.288 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_3 | 77.713 |
Retrieval | C_MTEB/CovidRetrieval | mrr_at_5 | 78.462 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_1 | 70.601 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_10 | 82.555 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_100 | 83.718 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_1000 | 83.855 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_3 | 79.779 |
Retrieval | C_MTEB/CovidRetrieval | ndcg_at_5 | 81.149 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_1 | 70.601 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_10 | 9.463000000000001 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_100 | 0.9979999999999999 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_1000 | 0.101 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_3 | 28.871999999999996 |
Retrieval | C_MTEB/CovidRetrieval | precision_at_5 | 18.019 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_1 | 70.39 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_10 | 93.572 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_100 | 98.736 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_1000 | 99.895 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_3 | 86.091 |
Retrieval | C_MTEB/CovidRetrieval | recall_at_5 | 89.384 |
Retrieval | C_MTEB/DuRetrieval | map_at_1 | 26.147 |
Retrieval | C_MTEB/DuRetrieval | map_at_10 | 80.205 |
Retrieval | C_MTEB/DuRetrieval | map_at_100 | 82.96 |
Retrieval | C_MTEB/DuRetrieval | map_at_1000 | 82.999 |
Retrieval | C_MTEB/DuRetrieval | map_at_3 | 55.16799999999999 |
Retrieval | C_MTEB/DuRetrieval | map_at_5 | 69.798 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_1 | 89.8 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_10 | 93.16799999999999 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_100 | 93.22500000000001 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_1000 | 93.228 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_3 | 92.85 |
Retrieval | C_MTEB/DuRetrieval | mrr_at_5 | 93.067 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_1 | 89.8 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_10 | 87.668 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_100 | 90.16 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_1000 | 90.505 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_3 | 85.842 |
Retrieval | C_MTEB/DuRetrieval | ndcg_at_5 | 85.101 |
Retrieval | C_MTEB/DuRetrieval | precision_at_1 | 89.8 |
Retrieval | C_MTEB/DuRetrieval | precision_at_10 | 42.225 |
Retrieval | C_MTEB/DuRetrieval | precision_at_100 | 4.8149999999999995 |
Retrieval | C_MTEB/DuRetrieval | precision_at_1000 | 0.48900000000000005 |
Retrieval | C_MTEB/DuRetrieval | precision_at_3 | 76.967 |
Retrieval | C_MTEB/DuRetrieval | precision_at_5 | 65.32 |
Retrieval | C_MTEB/DuRetrieval | recall_at_1 | 26.147 |
Retrieval | C_MTEB/DuRetrieval | recall_at_10 | 89.30399999999999 |
Retrieval | C_MTEB/DuRetrieval | recall_at_100 | 97.609 |
Retrieval | C_MTEB/DuRetrieval | recall_at_1000 | 99.409 |
Retrieval | C_MTEB/DuRetrieval | recall_at_3 | 57.56 |
Retrieval | C_MTEB/DuRetrieval | recall_at_5 | 74.78200000000001 |
Retrieval | C_MTEB/EcomRetrieval | map_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | map_at_10 | 62.507000000000005 |
Retrieval | C_MTEB/EcomRetrieval | map_at_100 | 63.068000000000005 |
Retrieval | C_MTEB/EcomRetrieval | map_at_1000 | 63.08200000000001 |
Retrieval | C_MTEB/EcomRetrieval | map_at_3 | 60.050000000000004 |
Retrieval | C_MTEB/EcomRetrieval | map_at_5 | 61.41 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_10 | 62.507000000000005 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_100 | 63.068000000000005 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_1000 | 63.08200000000001 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_3 | 60.050000000000004 |
Retrieval | C_MTEB/EcomRetrieval | mrr_at_5 | 61.41 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_10 | 67.31700000000001 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_100 | 69.862 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_1000 | 70.231 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_3 | 62.222 |
Retrieval | C_MTEB/EcomRetrieval | ndcg_at_5 | 64.66300000000001 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_10 | 8.260000000000002 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_100 | 0.941 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_1000 | 0.097 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_3 | 22.833000000000002 |
Retrieval | C_MTEB/EcomRetrieval | precision_at_5 | 14.879999999999999 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_1 | 53.300000000000004 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_10 | 82.6 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_100 | 94.1 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_1000 | 97.0 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_3 | 68.5 |
Retrieval | C_MTEB/EcomRetrieval | recall_at_5 | 74.4 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_1 | 70.68799999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_10 | 79.28399999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_100 | 79.537 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_1000 | 79.545 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_3 | 77.643 |
Retrieval | C_MTEB/MMarcoRetrieval | map_at_5 | 78.694 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_1 | 73.05199999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_10 | 79.794 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_100 | 80.024 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_1000 | 80.03099999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_3 | 78.441 |
Retrieval | C_MTEB/MMarcoRetrieval | mrr_at_5 | 79.29 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_1 | 73.05199999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_10 | 82.627 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_100 | 83.737 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_1000 | 83.946 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_3 | 79.585 |
Retrieval | C_MTEB/MMarcoRetrieval | ndcg_at_5 | 81.306 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_1 | 73.05199999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_10 | 9.835 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_100 | 1.038 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_1000 | 0.106 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_3 | 29.756 |
Retrieval | C_MTEB/MMarcoRetrieval | precision_at_5 | 18.788 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_1 | 70.68799999999999 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_10 | 92.38300000000001 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_100 | 97.347 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_1000 | 98.992 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_3 | 84.37 |
Retrieval | C_MTEB/MMarcoRetrieval | recall_at_5 | 88.434 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_1 | 53.1 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_10 | 58.36599999999999 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_100 | 58.939 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_1000 | 58.99100000000001 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_3 | 57.15 |
Retrieval | C_MTEB/MedicalRetrieval | map_at_5 | 57.794999999999995 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_1 | 53.2 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_10 | 58.416000000000004 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_100 | 58.989999999999995 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_1000 | 59.041 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_3 | 57.199999999999996 |
Retrieval | C_MTEB/MedicalRetrieval | mrr_at_5 | 57.845 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_1 | 53.1 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_10 | 60.989000000000004 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_100 | 63.967 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_1000 | 65.436 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_3 | 58.425000000000004 |
Retrieval | C_MTEB/MedicalRetrieval | ndcg_at_5 | 59.583 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_1 | 53.1 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_10 | 6.93 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_100 | 0.8370000000000001 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_1000 | 0.096 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_3 | 20.7 |
Retrieval | C_MTEB/MedicalRetrieval | precision_at_5 | 12.98 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_1 | 53.1 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_10 | 69.3 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_100 | 83.7 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_1000 | 95.5 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_3 | 62.1 |
Retrieval | C_MTEB/MedicalRetrieval | recall_at_5 | 64.9 |
Reranking | C-MTEB/Mmarco-reranking | map | 33.548800108363665 |
Reranking | C-MTEB/Mmarco-reranking | mrr | 32.529761904761905 |
Reranking | C-MTEB/T2Reranking | map | 69.43381583724414 |
Reranking | C-MTEB/T2Reranking | mrr | 80.47879657392181 |
Retrieval | C_MTEB/T2Retrieval | map_at_1 | 28.116000000000003 |
Retrieval | C_MTEB/T2Retrieval | map_at_10 | 80.026 |
Retrieval | C_MTEB/T2Retrieval | map_at_100 | 83.541 |
Retrieval | C_MTEB/T2Retrieval | map_at_1000 | 83.592 |
Retrieval | C_MTEB/T2Retrieval | map_at_3 | 56.092 |
Retrieval | C_MTEB/T2Retrieval | map_at_5 | 69.114 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_1 | 91.557 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_10 | 93.73700000000001 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_100 | 93.808 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_1000 | 93.811 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_3 | 93.384 |
Retrieval | C_MTEB/T2Retrieval | mrr_at_5 | 93.614 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_1 | 91.553 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_10 | 87.003 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_100 | 90.128 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_1000 | 90.615 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_3 | 88.205 |
Retrieval | C_MTEB/T2Retrieval | ndcg_at_5 | 86.978 |
Retrieval | C_MTEB/T2Retrieval | precision_at_1 | 91.553 |
Retrieval | C_MTEB/T2Retrieval | precision_at_10 | 43.25 |
Retrieval | C_MTEB/T2Retrieval | precision_at_100 | 5.067 |
Retrieval | C_MTEB/T2Retrieval | precision_at_1000 | 0.518 |
Retrieval | C_MTEB/T2Retrieval | precision_at_3 | 77.25 |
Retrieval | C_MTEB/T2Retrieval | precision_at_5 | 64.902 |
Retrieval | C_MTEB/T2Retrieval | recall_at_1 | 28.116000000000003 |
Retrieval | C_MTEB/T2Retrieval | recall_at_10 | 85.994 |
Retrieval | C_MTEB/T2Retrieval | recall_at_100 | 96.345 |
Retrieval | C_MTEB/T2Retrieval | recall_at_1000 | 98.867 |
Retrieval | C_MTEB/T2Retrieval | recall_at_3 | 57.67099999999999 |
Retrieval | C_MTEB/T2Retrieval | recall_at_5 | 72.26 |
Retrieval | C_MTEB/VideoRetrieval | map_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | map_at_10 | 73.763 |
Retrieval | C_MTEB/VideoRetrieval | map_at_100 | 74.116 |
Retrieval | C_MTEB/VideoRetrieval | map_at_1000 | 74.12100000000001 |
Retrieval | C_MTEB/VideoRetrieval | map_at_3 | 72.15 |
Retrieval | C_MTEB/VideoRetrieval | map_at_5 | 73.25 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_10 | 73.763 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_100 | 74.116 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_1000 | 74.12100000000001 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_3 | 72.15 |
Retrieval | C_MTEB/VideoRetrieval | mrr_at_5 | 73.25 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_10 | 77.639 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_100 | 79.396 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_1000 | 79.554 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_3 | 74.406 |
Retrieval | C_MTEB/VideoRetrieval | ndcg_at_5 | 76.385 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_10 | 8.959999999999999 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_100 | 0.979 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_1000 | 0.099 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_3 | 26.967000000000002 |
Retrieval | C_MTEB/VideoRetrieval | precision_at_5 | 17.14 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_1 | 64.9 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_10 | 89.60000000000001 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_100 | 97.89999999999999 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_1000 | 99.2 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_3 | 80.9 |
Retrieval | C_MTEB/VideoRetrieval | recall_at_5 | 85.7 |
Jina Embeddings V3
Jina Embeddings V3 是一個多語言句子嵌入模型,支持超過100種語言,專注於句子相似度和特徵提取任務。
文本嵌入
Transformers 支持多種語言

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
基於MS Marco段落排序任務訓練的交叉編碼器模型,用於信息檢索中的查詢-段落相關性評分
文本嵌入 英語
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
基於蒸餾技術的稀疏檢索模型,專為OpenSearch優化,支持免推理文檔編碼,在搜索相關性和效率上優於V1版本
文本嵌入
Transformers 英語

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
基於PubMedBERT的生物醫學實體表徵模型,通過自對齊預訓練優化語義關係捕捉
文本嵌入 英語
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large 是一個強大的句子轉換器模型,專注於句子相似度和文本嵌入任務,在多個基準測試中表現出色。
文本嵌入 英語
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 是一個英文句子轉換器模型,專注於句子相似度任務,在多個文本嵌入基準測試中表現優異。
文本嵌入
Transformers 支持多種語言

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base 是一個多語言的句子嵌入模型,支持超過50種語言,適用於句子相似度計算等任務。
文本嵌入
Transformers 支持多種語言

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT是一個化學語言模型,旨在實現完全由機器驅動的超快聚合物信息學。它將PSMILES字符串映射為600維密集指紋,以數值形式表示聚合物化學結構。
文本嵌入
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
基於土耳其語BERT的句子嵌入模型,專為語義相似度任務優化
文本嵌入
Transformers 其他

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
基於BAAI/bge-small-en-v1.5模型微調的文本嵌入模型,通過MEDI數據集與MTEB分類任務數據集訓練,優化了檢索任務的查詢編碼能力。
文本嵌入
Safetensors 英語
G
avsolatorio
945.68k
29
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98