🚀 Model Evaluation Results
This document presents the evaluation results of a model based on Qwen/Qwen2-VL-2B-Instruct
, covering multiple tasks and datasets.
📚 Documentation
Model Information
Property |
Details |
Base Model |
Qwen/Qwen2-VL-2B-Instruct |
Supported Languages |
English, Chinese |
Tags |
mteb, sentence-transformers, transformers, Qwen2-VL, sentence-similarity, vidore |
Evaluation Results
The model has been evaluated on various tasks, including Semantic Textual Similarity (STS), Classification, Retrieval, Clustering, and Reranking. The following are the detailed results:
1. STS Tasks
Dataset |
cos_sim_pearson |
cos_sim_spearman |
euclidean_pearson |
euclidean_spearman |
manhattan_pearson |
manhattan_spearman |
C-MTEB/AFQMC (validation) |
61.03190209456061 |
67.54853383020948 |
65.38958681599493 |
67.54853383020948 |
65.25341659273157 |
67.34190190683134 |
C-MTEB/ATEC (test) |
50.83794357648487 |
54.03230997664373 |
55.2072028123375 |
54.032311102613264 |
55.05163232251946 |
53.81272176804127 |
mteb/biosses-sts (test) |
89.18568151905953 |
86.47666922475281 |
87.25416218056225 |
86.47666922475281 |
87.04960508086356 |
86.73992823533615 |
C-MTEB/BQ (test) |
75.7464284612374 |
77.71894224189296 |
77.63454068918787 |
77.71894224189296 |
77.58744810404339 |
77.63293552726073 |
2. Classification Tasks
Dataset |
Accuracy |
AP |
F1 |
mteb/amazon_counterfactual (en, test) |
72.55223880597015 |
35.01515316721116 |
66.44086070814382 |
mteb/amazon_polarity (test) |
96.75819999999999 |
95.51009242092881 |
96.75713119357414 |
mteb/amazon_reviews_multi (en, test) |
61.971999999999994 |
- |
60.50745575187704 |
mteb/amazon_reviews_multi (zh, test) |
53.49 |
- |
51.576550662258434 |
mteb/banking77 (test) |
80.2435064935065 |
- |
79.44078343737895 |
3. Retrieval Tasks
| Dataset | map_at_1 | map_at_10 | map_at_100 | map_at_1000 | map_at_3 | map_at_5 | mrr_at_1 | mrr_at_10 | mrr_at_100 | mrr_at_1000 | mrr_at_3 | mrr_at_5 | ndcg_at_1 | ndcg_at_10 | ndcg_at_100 | ndcg_at_1000 | ndcg_at_3 | ndcg_at_5 | precision_at_1 | precision_at_10 | precision_at_100 | precision_at_1000 | precision_at_3 | precision_at_5 | recall_at_1 | recall_at_10 | recall_at_100 | recall_at_1000 | recall_at_3 | recall_at_5 |
|---------|----------|-----------|------------|-------------|----------|----------|-----------|------------|-------------|--------------|-----------|-----------|-----------|------------|-------------|--------------|-----------|-----------|--------------|---------------|----------------|-----------------|--------------|--------------|-----------|------------|-------------|--------------|--------------|--------------|--------------|--------------|
| mteb/arguana (test) | 36.272999999999996 | 52.782 | 53.339999999999996 | 53.342999999999996 | 48.4 | 50.882000000000005 | 36.984 | 53.052 | 53.604 | 53.607000000000006 | 48.613 | 51.159 | 36.272999999999996 | 61.524 | 63.796 | 63.869 | 52.456 | 56.964000000000006 | 36.272999999999996 | 8.926 | 0.989 | 0.1 | 21.407999999999998 | 15.049999999999999 | 36.272999999999996 | 89.25999999999999 | 98.933 | 99.502 | 64.225 | 75.249 |
| BeIR/cqadupstack (Android, test) | 30.623 | 40.482 | 41.997 | 42.135 | 37.754 | 39.031 | 37.482 | 46.311 | 47.211999999999996 | 47.27 | 44.157999999999994 | 45.145 | 37.482 | 46.142 | 51.834 | 54.164 | 42.309000000000005 | 43.485 | 37.482 | 8.455 | 1.3780000000000001 | 0.188 | 20.172 | 13.705 | 30.623 | 56.77100000000001 | 80.034 | 94.62899999999999 | 44.663000000000004 | 48.692 |
| BeIR/cqadupstack (English, test) | 27.941 | 38.437 | 39.625 | 39.753 | 35.388999999999996 | 37.113 | 34.522000000000006 | 43.864999999999995 | 44.533 | 44.580999999999996 | 41.55 | 42.942 | 34.522000000000006 | 44.330000000000005 | 48.61 | 50.712999999999994 | 39.834 | 42.016 | 34.522000000000006 | 8.471 | 1.3379999999999999 | 0.182 | 19.363 | 13.898 | 27.941 | 55.336 | 73.51100000000001 | 86.636 | 42.54 | 48.392 |
| BeIR/cqadupstack (Gaming, test) | 32.681 | 45.48 | 46.542 | 46.604 | 42.076 | 44.076 | 37.492 | 48.746 | 49.485 | 49.517 | 45.998 | 47.681000000000004 | 37.492 | 51.778999999999996 | 56.294 | 57.58 | 45.856 | 48.968 | 37.492 | 8.620999999999999 | - | - | - | - | 32.681 | - | - | - | - | - |
4. Clustering Tasks
Dataset |
v_measure |
mteb/arxiv-clustering-p2p (test) |
52.45236368396085 |
mteb/arxiv-clustering-s2s (test) |
46.83781937870832 |
mteb/biorxiv-clustering-p2p (test) |
44.68220155432257 |
mteb/biorxiv-clustering-s2s (test) |
40.666150477589284 |
C-MTEB/CLSClusteringP2P (test) |
44.23533333311907 |
C-MTEB/CLSClusteringS2S (test) |
43.01114481307774 |
5. Reranking Tasks
Dataset |
MAP |
MRR |
mteb/askubuntudupquestions-reranking (test) |
60.653430349851746 |
74.28736314470387 |
C-MTEB/CMedQAv1-reranking (test) |
86.4349853821696 |
88.80150793650795 |
C-MTEB/CMedQAv2-reranking (test) |
87.56417400982208 |
89.85813492063491 |
📄 License
This project is licensed under the Apache-2.0 license.