🚀 piccolo-base-zh
This document presents the performance metrics of the piccolo-base-zh
model on various tasks in the MTEB (Massive Text Embedding Benchmark) framework, including semantic textual similarity (STS), classification, clustering, reranking, retrieval, and pair classification.
📚 Documentation
Model Performance Metrics
1. Semantic Textual Similarity (STS)
Task |
Dataset |
Split |
Cosine Similarity (Pearson) |
Cosine Similarity (Spearman) |
Euclidean Distance (Pearson) |
Euclidean Distance (Spearman) |
Manhattan Distance (Pearson) |
Manhattan Distance (Spearman) |
STS |
MTEB AFQMC |
validation |
49.16558217326158 |
51.4049475858823 |
49.85853741070363 |
51.501428092542234 |
49.746099634926296 |
51.41081804320127 |
STS |
MTEB ATEC |
test |
52.385361699031854 |
52.59114913702212 |
54.994530439418355 |
52.54102886188004 |
54.9503071669608 |
52.51465652540901 |
STS |
MTEB BQ |
test |
60.98952187211432 |
62.68189713123115 |
61.089426749761344 |
62.41743375544581 |
61.14747216341409 |
62.488918956547046 |
STS |
MTEB LCQMC |
test |
70.02878561337955 |
75.39509553139982 |
73.92598696939956 |
75.5471147196853 |
73.88049486090739 |
75.51361990583285 |
2. Classification
Task |
Dataset |
Split |
Accuracy |
F1 |
AP |
Classification |
MTEB AmazonReviewsClassification (zh) |
test |
40.236 |
39.43040092463147 |
- |
Classification |
MTEB IFlyTek |
validation |
44.34782608695652 |
36.401426200836205 |
- |
Classification |
MTEB JDReview |
test |
84.25891181988743 |
78.55080202541332 |
50.54636280166089 |
3. Clustering
Task |
Dataset |
Split |
V-Measure |
Clustering |
MTEB CLSClusteringP2P |
test |
38.36392300667918 |
Clustering |
MTEB CLSClusteringS2S |
test |
35.645927581489175 |
4. Reranking
Task |
Dataset |
Split |
MAP |
MRR |
Reranking |
MTEB CMedQAv1 |
test |
85.25085782849087 |
87.77154761904762 |
Reranking |
MTEB CMedQAv2 |
test |
86.15357754080844 |
88.53547619047617 |
5. Retrieval
Task |
Dataset |
Split |
MAP@1 |
MAP@10 |
MAP@100 |
MAP@1000 |
MAP@3 |
MAP@5 |
MRR@1 |
MRR@10 |
MRR@100 |
MRR@1000 |
MRR@3 |
MRR@5 |
NDCG@1 |
NDCG@10 |
NDCG@100 |
NDCG@1000 |
NDCG@3 |
NDCG@5 |
Precision@1 |
Precision@10 |
Precision@100 |
Precision@1000 |
Precision@3 |
Precision@5 |
Recall@1 |
Recall@10 |
Recall@100 |
Recall@1000 |
Recall@3 |
Recall@5 |
Retrieval |
MTEB CmedqaRetrieval |
dev |
23.683 |
35.522999999999996 |
37.456 |
37.576 |
31.584 |
33.684999999999995 |
36.459 |
44.534 |
45.6 |
45.647 |
42.186 |
43.482 |
36.459 |
42.025 |
49.754 |
51.815999999999995 |
37.056 |
38.962 |
36.459 |
9.485000000000001 |
1.567 |
0.183 |
21.13 |
15.209 |
23.683 |
52.190999999999995 |
84.491 |
98.19600000000001 |
37.09 |
43.262 |
Retrieval |
MTEB CovidRetrieval |
dev |
72.99799999999999 |
81.271 |
81.53399999999999 |
81.535 |
80.049 |
80.793 |
73.13 |
81.193 |
81.463 |
81.464 |
80.067 |
80.741 |
73.34 |
84.503 |
85.643 |
85.693 |
82.135 |
83.401 |
73.34 |
9.536 |
1.004 |
0.101 |
29.54 |
18.398 |
72.99799999999999 |
94.31 |
99.368 |
99.789 |
87.935 |
90.991 |
Retrieval |
MTEB DuRetrieval |
dev |
26.537 |
81.292 |
84.031 |
84.066 |
56.571000000000005 |
71.082 |
91.2 |
93.893 |
93.955 |
93.95700000000001 |
93.61699999999999 |
93.767 |
91.2 |
88.255 |
90.813 |
91.144 |
87.435 |
85.961 |
91.2 |
42.14 |
4.817 |
0.48900000000000005 |
78.467 |
65.75999999999999 |
26.537 |
89.262 |
97.783 |
99.49799999999999 |
58.573 |
75.154 |
Retrieval |
MTEB EcomRetrieval |
dev |
48.5 |
57.898 |
58.599000000000004 |
58.616 |
55.1 |
56.80500000000001 |
48.5 |
57.898 |
58.599000000000004 |
58.616 |
55.1 |
56.80500000000001 |
48.5 |
62.876 |
66.00200000000001 |
66.467 |
57.162 |
60.263999999999996 |
48.5 |
7.870000000000001 |
0.927 |
0.096 |
21.032999999999998 |
14.14 |
48.5 |
78.7 |
92.7 |
96.39999999999999 |
63.1 |
70.7 |
Retrieval |
MTEB MMarcoRetrieval |
dev |
64.739 |
74.039 |
74.38 |
74.39099999999999 |
72.074 |
73.29299999999999 |
66.92 |
74.636 |
74.94 |
74.95 |
72.911 |
73.981 |
66.92 |
77.924 |
79.471 |
79.73400000000001 |
74.17200000000001 |
76.236 |
66.92 |
9.5 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
6. Pair Classification
Task |
Dataset |
Split |
Cosine Similarity (Accuracy) |
Cosine Similarity (AP) |
Cosine Similarity (F1) |
Cosine Similarity (Precision) |
Cosine Similarity (Recall) |
Dot Product (Accuracy) |
Dot Product (AP) |
Dot Product (F1) |
Dot Product (Precision) |
Dot Product (Recall) |
Euclidean Distance (Accuracy) |
Euclidean Distance (AP) |
Euclidean Distance (F1) |
Euclidean Distance (Precision) |
Euclidean Distance (Recall) |
Manhattan Distance (Accuracy) |
Manhattan Distance (AP) |
Manhattan Distance (F1) |
Manhattan Distance (Precision) |
Manhattan Distance (Recall) |
Max (Accuracy) |
Max (AP) |
Max (F1) |
PairClassification |
MTEB Cmnli |
validation |
74.20324714371618 |
82.32631646194994 |
76.64052827073876 |
68.58725761772854 |
86.83656768763151 |
70.33072760072159 |
77.46972172609794 |
73.6668924804026 |
62.84676354029062 |
88.98760813654431 |
74.78051713770296 |
82.65778389584023 |
77.1843623157445 |
71.05211406096362 |
84.47509936871639 |
74.76849067949489 |
82.55694030572194 |
77.1776459569154 |
69.5423855963991 |
86.69628244096329 |
74.78051713770296 |
82.65778389584023 |
77.1843623157445 |