Stella Base Zh V2
S
Stella Base Zh V2
Developed by infgrad
stella-base-zh-v2 is a Chinese semantic similarity calculation model based on sentence transformers, supporting various text similarity tasks and evaluation benchmarks.
Downloads 95
Release Time : 10/13/2023
Model Overview
This model is mainly used for semantic similarity calculation, feature extraction, and multi-task evaluation of Chinese texts, and is suitable for various scenarios such as financial Q&A and natural language inference.
Model Features
Multi-task evaluation support
Supports various Chinese text similarity tasks and evaluation benchmarks, including Ant Financial Q&A, ATEC, BQ, etc.
High-performance semantic similarity calculation
Performs well on multiple datasets, especially in financial Q&A and natural language inference tasks.
Support for multiple distance metrics
Supports multiple similarity measurement methods such as cosine similarity, Euclidean distance, and Manhattan distance.
Model Capabilities
Semantic text similarity calculation
Text feature extraction
Text pair classification
Text clustering
Retrieval task
Re-ranking task
Use Cases
Financial field
Financial Q&A system
Used in the financial field's Q&A system to calculate the semantic similarity between questions and answers.
Performs well on the Ant Financial Q&A dataset
Medical field
Medical Q&A retrieval
Used in the medical field's Q&A retrieval system.
The average accuracy reaches 84.69% on the CMedQA dataset
E-commerce field
Product review classification
Used for product review classification on e-commerce platforms.
The accuracy is 39.64% in the Amazon Chinese review classification task
🚀 stella-base-zh-v2
This is a model for sentence similarity tasks. It has been tested on multiple datasets from C-MTEB and mteb, and the results show its performance in various tasks such as STS, Classification, Clustering, Reranking, and Retrieval.
📚 Documentation
Model Information
Property | Details |
---|---|
Pipeline Tag | sentence-similarity |
Tags | sentence-transformers, feature-extraction, sentence-similarity, mteb |
Model Name | stella-base-zh-v2 |
Results on Datasets
1. STS Tasks
- C-MTEB/AFQMC (Validation Split)
Metric Value cos_sim_pearson 44.62083443545288 cos_sim_spearman 46.72814628391134 euclidean_pearson 45.11522093816821 euclidean_spearman 46.72818648900957 manhattan_pearson 44.98820754682395 manhattan_spearman 46.63576705524296 - C-MTEB/ATEC (Test Split)
Metric Value cos_sim_pearson 49.543902370260234 cos_sim_spearman 51.22161152883018 euclidean_pearson 53.49586541060596 euclidean_spearman 51.22161490583934 manhattan_pearson 53.51023339947787 manhattan_spearman 51.22426632538443 - C-MTEB/LCQMC (Test Split)
Metric Value cos_sim_pearson 70.2307617475436 cos_sim_spearman 76.88912653700545 euclidean_pearson 75.47976675486538 euclidean_spearman 76.88912210059333 manhattan_pearson 75.45834919257487 manhattan_spearman 76.8669208121889 - C-MTEB/PAWSX (Test Split)
Metric Value cos_sim_pearson 24.231255568030463 cos_sim_spearman 29.6964906904186 euclidean_pearson 30.166130502867016 euclidean_spearman 29.69614167804371 manhattan_pearson 30.166606116745935 manhattan_spearman 29.62681453661945 - C-MTEB/QBQTC (Test Split)
Metric Value cos_sim_pearson 34.88835755574809 cos_sim_spearman 37.3797926051053 euclidean_pearson 35.46629492698549 euclidean_spearman 37.37987510604593 manhattan_pearson 35.4953353526957 manhattan_spearman 37.41397231689605 - mteb/sts22-crosslingual-sts (Test Split, zh Config)
Metric Value cos_sim_pearson 67.79575721136626 cos_sim_spearman 69.02068400784196 euclidean_pearson 68.30675023447176 euclidean_spearman 69.02068400784196 manhattan_pearson 69.912842597
2. Classification Tasks
- C-MTEB/IFlyTek-classification (Validation Split)
Metric Value accuracy 47.895344363216616 f1 34.95151253165417 - C-MTEB/JDReview-classification (Test Split)
Metric Value accuracy 84.78424015009381 ap 52.436279969597685 f1 79.49258679392281 - C-MTEB/MultilingualSentiment-classification (Validation Split)
Metric Value accuracy 71.66666666666666 f1 70.92944632461379 - C-MTEB/OnlineShopping-classification (Test Split)
Metric Value accuracy 91.38000000000002 ap 89.12250244489272 f1 91.36604511107015 - mteb/amazon_massive_intent (Test Split, zh-CN Config)
Metric Value accuracy 68.08675184936112 f1 65.51466585063827 - mteb/amazon_massive_scenario (Test Split, zh-CN Config)
Metric Value accuracy 73.22461331540013 f1 72.675432030145
3. Clustering Tasks
- C-MTEB/CLSClusteringP2P (Test Split)
Metric Value v_measure 40.25857488823951 - C-MTEB/CLSClusteringS2S (Test Split)
Metric Value v_measure 37.17501553349549
4. Reranking Tasks
- C-MTEB/CMedQAv1-reranking (Test Split)
Metric Value map 84.69751849160603 mrr 87.16257936507937 - C-MTEB/CMedQAv2-reranking (Test Split)
Metric Value map 85.31468551417655 mrr 87.74658730158731 - C-MTEB/Mmarco-reranking (Dev Split)
Metric Value map 28.047948482579244 mrr 26.63809523809524
5. Retrieval Tasks
- C-MTEB/CmedqaRetrieval (Dev Split)
Metric Value map_at_1 24.181 map_at_10 35.615 map_at_100 37.444 map_at_1000 37.573 map_at_3 31.679000000000002 map_at_5 33.854 mrr_at_1 37.108999999999995 mrr_at_10 44.653 mrr_at_100 45.647 mrr_at_1000 45.701 mrr_at_3 42.256 mrr_at_5 43.497 ndcg_at_1 37.108999999999995 ndcg_at_10 42.028999999999996 ndcg_at_100 49.292 ndcg_at_1000 51.64 ndcg_at_3 37.017 ndcg_at_5 38.997 precision_at_1 37.108999999999995 precision_at_10 9.386999999999999 precision_at_100 1.536 precision_at_1000 0.183 precision_at_3 20.93 precision_at_5 15.268999999999998 recall_at_1 24.181 recall_at_10 51.961999999999996 recall_at_100 82.122 recall_at_1000 98.059 recall_at_3 36.730000000000004 recall_at_5 42.884 - C-MTEB/CovidRetrieval (Dev Split)
Metric Value map_at_1 67.43900000000001 map_at_10 76.00099999999999 map_at_100 76.297 map_at_1000 76.29899999999999 map_at_3 74.412 map_at_5 75.177 mrr_at_1 67.65 mrr_at_10 76.007 mrr_at_100 76.322 mrr_at_1000 76.324 mrr_at_3 74.464 mrr_at_5 75.265 ndcg_at_1 67.65 ndcg_at_10 79.85600000000001 ndcg_at_100 81.34400000000001 ndcg_at_1000 81.44200000000001 ndcg_at_3 76.576 ndcg_at_5 77.956 precision_at_1 67.65 precision_at_10 9.283 precision_at_100 0.9990000000000001 precision_at_1000 0.101 precision_at_3 27.749000000000002 precision_at_5 17.345 recall_at_1 67.43900000000001 recall_at_10 91.781 recall_at_100 98.84100000000001 recall_at_1000 99.684 recall_at_3 82.719 recall_at_5 86.038 - C-MTEB/DuRetrieval (Dev Split)
Metric Value map_at_1 25.354 map_at_10 79.499 map_at_100 82.416 map_at_1000 82.451 map_at_3 54.664 map_at_5 69.378 mrr_at_1 89.25 mrr_at_10 92.666 mrr_at_100 92.738 mrr_at_1000 92.74 mrr_at_3 92.342 mrr_at_5 92.562 ndcg_at_1 89.25 ndcg_at_10 86.97 ndcg_at_100 89.736 ndcg_at_1000 90.069 ndcg_at_3 85.476 ndcg_at_5 84.679 precision_at_1 89.25 precision_at_10 41.9 precision_at_100 4.811 precision_at_1000 0.48900000000000005 precision_at_3 76.86699999999999 precision_at_5 65.25 recall_at_1 25.354 recall_at_10 88.64999999999999 recall_at_100 97.56 recall_at_1000 99.37 recall_at_3 57.325 recall_at_5 74.614 - C-MTEB/EcomRetrieval (Dev Split)
Metric Value map_at_1 48.3 map_at_10 57.765 map_at_100 58.418000000000006 map_at_1000 58.43899999999999 map_at_3 54.883 map_at_5 56.672999999999995 mrr_at_1 48.3 mrr_at_10 57.765 mrr_at_100 58.418000000000006 mrr_at_1000 58.43899999999999 mrr_at_3 54.883 mrr_at_5 56.672999999999995 ndcg_at_1 48.3 ndcg_at_10 62.846000000000004 ndcg_at_100 65.845 ndcg_at_1000 66.369 ndcg_at_3 56.996 ndcg_at_5 60.214999999999996 precision_at_1 48.3 precision_at_10 7.9 precision_at_100 0.9259999999999999 precision_at_1000 0.097 precision_at_3 21.032999999999998 precision_at_5 14.180000000000001 recall_at_1 48.3 recall_at_10 79.0 recall_at_100 92.60000000000001 recall_at_1000 96.7 recall_at_3 63.1 recall_at_5 70.89999999999999 - C-MTEB/MedicalRetrieval (Dev Split)
Metric Value map_at_1 49.2 map_at_10 55.394 map_at_100 55.883 map_at_1000 55.93900000000001 map_at_3 53.733 map_at_5 54.778000000000006 mrr_at_1 49.3 mrr_at_10 55.444 mrr_at_100 55.933 mrr_at_1000 55.989 mrr_at_3 53.783 mrr_at_5 54.827999999999996 ndcg_at_1 49.2 ndcg_at_10 58.501999999999995 ndcg_at_100 61.181 ndcg_at_1000 62.848000000000006 ndcg_at_3 55.143 ndcg_at_5 57.032000000000004 precision_at_1 49.2 precision_at_10 6.83 precision_at_100 0.815 precision_at_1000 0.095 precision_at_3 19.733 precision_at_5 12.76 recall_at_1 49.2 recall_at_10 68.30000000000001 recall_at_100 81.5 recall_at_1000 95.0 recall_at_3 59.199999999999996 recall_at_5 63.800000000000004 - C-MTEB/MMarcoRetrieval (Dev Split)
Metric Value map_at_1 65.837 map_at_10 74.72 map_at_100 75.068 map_at_1000 75.079 map_at_3 72.832 map_at_5 74.07000000000001 mrr_at_1 68.009 mrr_at_10 75.29400000000001 mrr_at_100 75.607 mrr_at_1000 75.617 mrr_at_3 73.677 mrr_at_5 74.74199999999999 ndcg_at_1 68.009 ndcg_at_10 78.36 ndcg_at_100 79.911 ndcg_at_1000 80.226 ndcg_at_3 74.825 ndcg_at_5 76.9 precision_at_1 68.009 precision_at_10 9.463000000000001 precision_at_100 1.023 precision_at_1000 0.105 precision_at_3 28.075 precision_at_5 17.951 recall_at_1 65.837 recall_at_10 89.00099999999999 recall_at_100 95.968 recall_at_1000 98.461 recall_at_3 79.69800000000001 recall_at_5 84.623
6. PairClassification Tasks
- C-MTEB/CMNLI (Validation Split)
Metric Value cos_sim_accuracy 76.23571858087793 cos_sim_ap 84.75290046905519 cos_sim_f1 77.70114942528735 cos_sim_precision 73.05475504322767 cos_sim_recall 82.97872340425532 dot_accuracy 76.23571858087793 dot_ap 84.75113928508674 dot_f1 77.70114942528735 dot_precision 73.05475504322767 dot_recall 82.97872340425532 euclidean_accuracy 76.23571858087793 euclidean_ap 84.75289931658567 euclidean_f1 77.70114942528735 euclidean_precision 73.05475504322767 euclidean_recall 82.97872340425532 manhattan_accuracy 76.17558628983764 manhattan_ap 84.75764676597448 manhattan_f1 77.73437499999999 manhattan_precision 72.52480259161773 manhattan_recall 83.75029226093056 max_accuracy 76.23571858087793 max_ap 84.75764676597448 max_f1 77.73437499999999 - C-MTEB/OCNLI (Validation Split)
Metric Value cos_sim_accuracy 70.00541418516514 cos_sim_ap 75.16499510773514 cos_sim_f1 73.09435517099301 cos_sim_precision 59.932432432432435 cos_sim_recall 93.66420274551214 dot_accuracy 70.00541418516514 dot_ap 75.16499510773514 dot_f1 73.09435517099301 dot_precision 59.932432432432435 dot_recall 93.66420274551214 euclidean_accuracy 70.00541418516514 euclidean_ap 75.16499510773514 euclidean_f1 73.09435517099301 euclidean_precision 59.932432432432435 euclidean_recall 93.66420274551214 manhattan_accuracy 70.11369788846778 manhattan_ap 75.1259071890593 manhattan_f1 72.91399229781771 manhattan_precision 61.294964028776974 manhattan_recall 89.96832101372756 max_accuracy 70.11369788846778 max_ap 75.16499510773514 max_f1 73.09435517099301
Jina Embeddings V3
Jina Embeddings V3 is a multilingual sentence embedding model supporting over 100 languages, specializing in sentence similarity and feature extraction tasks.
Text Embedding
Transformers Supports Multiple Languages

J
jinaai
3.7M
911
Ms Marco MiniLM L6 V2
Apache-2.0
A cross-encoder model trained on the MS Marco passage ranking task for query-passage relevance scoring in information retrieval
Text Embedding English
M
cross-encoder
2.5M
86
Opensearch Neural Sparse Encoding Doc V2 Distill
Apache-2.0
A sparse retrieval model based on distillation technology, optimized for OpenSearch, supporting inference-free document encoding with improved search relevance and efficiency over V1
Text Embedding
Transformers English

O
opensearch-project
1.8M
7
Sapbert From PubMedBERT Fulltext
Apache-2.0
A biomedical entity representation model based on PubMedBERT, optimized for semantic relation capture through self-aligned pre-training
Text Embedding English
S
cambridgeltl
1.7M
49
Gte Large
MIT
GTE-Large is a powerful sentence transformer model focused on sentence similarity and text embedding tasks, excelling in multiple benchmark tests.
Text Embedding English
G
thenlper
1.5M
278
Gte Base En V1.5
Apache-2.0
GTE-base-en-v1.5 is an English sentence transformer model focused on sentence similarity tasks, excelling in multiple text embedding benchmarks.
Text Embedding
Transformers Supports Multiple Languages

G
Alibaba-NLP
1.5M
63
Gte Multilingual Base
Apache-2.0
GTE Multilingual Base is a multilingual sentence embedding model supporting over 50 languages, suitable for tasks like sentence similarity calculation.
Text Embedding
Transformers Supports Multiple Languages

G
Alibaba-NLP
1.2M
246
Polybert
polyBERT is a chemical language model designed to achieve fully machine-driven ultrafast polymer informatics. It maps PSMILES strings into 600-dimensional dense fingerprints to numerically represent polymer chemical structures.
Text Embedding
Transformers

P
kuelumbus
1.0M
5
Bert Base Turkish Cased Mean Nli Stsb Tr
Apache-2.0
A sentence embedding model based on Turkish BERT, optimized for semantic similarity tasks
Text Embedding
Transformers Other

B
emrecan
1.0M
40
GIST Small Embedding V0
MIT
A text embedding model fine-tuned based on BAAI/bge-small-en-v1.5, trained with the MEDI dataset and MTEB classification task datasets, optimized for query encoding in retrieval tasks.
Text Embedding
Safetensors English
G
avsolatorio
945.68k
29
Featured Recommended AI Models