stella-large-zh-v2 Open Source Chinese Model - Free Sentence Similarity Calculation and Text Classification

Stella Large Zh V2

Developed by infgrad

stella-large-zh-v2 is a Chinese model focused on sentence similarity calculation, supporting various semantic text similarity tasks and text classification tasks.

Text Embedding

PyTorch

#Financial Q&A Similarity #Multi-task Evaluation #Chinese Semantic Understanding

Downloads 259

Release Time : 10/13/2023

Model Overview

This model is mainly used for tasks such as sentence similarity calculation, text classification, text clustering, and re-ranking, and performs excellently on multiple Chinese evaluation benchmarks.

Model Features

Support for Multi-task Evaluation Benchmarks

Performs excellently on multiple Chinese multi-task evaluation benchmarks (such as MTEB), covering tasks such as semantic text similarity, text classification, text clustering, and re-ranking.

High-performance Sentence Similarity Calculation

Demonstrates outstanding performance in sentence similarity calculation on datasets such as Ant Financial Q&A, ATEC, and Bank Q&A, supporting multiple distance metrics (cosine similarity, Euclidean distance, Manhattan distance).

Powerful Re-ranking Ability

In the re-ranking tasks of CMedQAv1 and CMedQAv2, both the average accuracy and the average reciprocal rank exceed 85%, showing excellent performance.

Model Capabilities

Sentence Similarity Calculation

Text Classification

Text Clustering

Re-ranking

Retrieval

Use Cases

Financial Field

Financial Q&A System

Used in the Q&A system of the financial field to calculate the similarity between user questions and candidate answers.

On the Ant Financial Q&A dataset, the Pearson value of cosine similarity is 47.34, and the Spearman value is 49.94.

Bank Customer Service Q&A

Used for Q&A matching and similarity calculation in the bank customer service system.

On the bank Q&A dataset, the Pearson value of cosine similarity is 62.83, and the Spearman value is 65.53.

Medical Field

Medical Q&A Re-ranking

Used for answer re-ranking in the medical Q&A system to improve the relevance of answers.

In the re-ranking tasks of CMedQAv1 and CMedQAv2, the average accuracies are 85.44 and 85.82 respectively.

COVID-19 Information Retrieval

Used for the retrieval and ranking of COVID-19 related information.

In the COVID-19 retrieval task, the top-1 average accuracy is 68.86, and the top-10 average accuracy is 77.10.

General Text Processing

Text Classification

Used for general text classification tasks, such as Amazon review classification.

In the Amazon review classification (Chinese) task, the accuracy is 40.81, and the F1 score is 39.02.

Text Clustering

Used for text clustering tasks, such as CLS point-to-point and sentence-to-sentence clustering.

In the CLS point-to-point clustering task, the V-measure is 39.95; in the sentence-to-sentence clustering task, the V-measure is 38.18.

🚀 stella-large-zh-v2

This model is designed for sentence similarity tasks, featuring sentence-transformers for feature extraction and has been evaluated on various datasets in the MTEB benchmark.

📚 Documentation

Model Information

Property	Details
Pipeline Tag	sentence-similarity
Tags	sentence-transformers, feature-extraction, sentence-similarity, mteb
Model Name	stella-large-zh-v2

Evaluation Results

1. STS (Semantic Textual Similarity) Tasks

Dataset	Cosine Similarity Pearson	Cosine Similarity Spearman	Euclidean Pearson	Euclidean Spearman	Manhattan Pearson	Manhattan Spearman
C-MTEB/AFQMC (validation)	47.34436411023816	49.947084806624545	48.128834319004824	49.947064694876815	48.083561270166484	49.90207128584442
C-MTEB/ATEC (test)	50.97998570817664	53.11852606980578	55.12610520736481	53.11852832108405	55.10299116717361	53.11304196536268
C-MTEB/BQ (test)	62.83544115057508	65.53509404838948	64.08198144850084	65.53509404760305	64.08808420747272	65.54907862648346
C-MTEB/LCQMC (test)	71.17149643620844	77.48040046337948	76.32337539923347	77.4804004621894	76.33275226275444	77.48979843086128
C-MTEB/PAWSX (test)	30.10755999973859	36.221732138848864	36.41120179336658	36.221731188009436	36.34865300346968	36.17696483080459
C-MTEB/QBQTC (test)	36.778975708100226	38.733929926753724	37.13383498228113	38.73374886550868	37.175732896552404	38.74120541657908
mteb/sts22-crosslingual-sts (test, zh)	65.97095922825076	68.87452938308421	67.23101642424429	68.87452938308421	67.29909334410189	68.89807985930508
C-MTEB/STSB	To be completed	To be completed	To be completed	To be completed	To be completed	To be completed

2. Classification Tasks

Dataset	Accuracy	F1	AP (if available)
mteb/amazon_reviews_multi (test, zh)	40.81799999999999	39.022194031906444	N/A
C-MTEB/IFlyTek-classification (validation)	47.98768757214313	35.23884426992269	N/A
C-MTEB/JDReview-classification (test)	86.97936210131333	81.87001614762136	56.292679530375736
mteb/amazon_massive_intent (test, zh-CN)	68.26160053799597	65.96949453305112	N/A
mteb/amazon_massive_scenario (test, zh-CN)	73.12037659717554	72.69052407105445	N/A
C-MTEB/MultilingualSentiment-classification (validation)	73.41000000000001	72.87768282499509	N/A
C-MTEB/OnlineShopping-classification (test)	91.81000000000002	91.79220350456818	89.35809579688139

3. Clustering Tasks

Dataset	V-Measure
C-MTEB/CLSClusteringP2P (test)	39.95428546140963
C-MTEB/CLSClusteringS2S (test)	38.18454393512963

4. Reranking Tasks

Dataset	MAP	MRR
C-MTEB/CMedQAv1-reranking (test)	85.4453602559479	88.1418253968254
C-MTEB/CMedQAv2-reranking (test)	85.82731720256984	88.53230158730159
C-MTEB/Mmarco-reranking (dev)	27.966807589556826	26.92023809523809

5. Retrieval Tasks

| Dataset | MAP@1 | MAP@3 | MAP@5 | MAP@10 | MAP@100 | MAP@1000 | MRR@1 | MRR@3 | MRR@5 | MRR@10 | MRR@100 | MRR@1000 | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 | NDCG@100 | NDCG@1000 | Precision@1 | Precision@3 | Precision@5 | Precision@10 | Precision@100 | Precision@1000 | Recall@1 | Recall@3 | Recall@5 | Recall@10 | Recall@100 | Recall@1000 | | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | | C-MTEB/CmedqaRetrieval (dev) | 24.459 | 32.356 | 34.499 | 36.274 | 38.168 | 38.292 | 37.584 | 42.919000000000004 | 44.283 | 45.323 | 46.361999999999995 | 46.412 | 37.584 | 37.808 | 39.711999999999996 | 42.63 | 50.114000000000004 | 52.312000000000005 | 37.584 | 21.505 | 15.514 | 9.51 | 1.554 | 0.183 | 24.459 | 37.553 | 43.712 | 52.32 | 83.423 | 98.247 | | C-MTEB/CovidRetrieval (dev) | 68.862 | 75.457 | 76.254 | 77.106 | 77.455 | 77.459 | 69.125 | 75.606 | 76.29599999999999 | 77.13799999999999 | 77.488 | 77.492 | 69.02000000000001 | 77.472 | 78.892 | 80.81099999999999 | 82.298 | 82.403 | 69.02000000000001 | 27.924 | 17.492 | 9.336 | 0.9990000000000001 | 0.101 | 68.862 | 83.193 | 86.617 | 92.308 | 98.84100000000001 | 99.684 | | C-MTEB/DuRetrieval (dev) | 25.063999999999997 | 53.613 | 68.008 | 78.02 | 81.022 | 81.06 | 87.8 | 91.508 | 91.758 | 91.827 | 91.913 | 91.915 | 87.8 | 84.191 | 83.433 | 85.753 | 88.82900000000001 | 89.208 | 87.8 | 75.767 | 64.25999999999999 | 41.33 | 4.8 | 0.48900000000000005 | 25.063999999999997 | 56.259 | 73.505 | 87.357 | 97.261 | 99.309 | | C-MTEB/EcomRetrieval (dev) | 46.800000000000004 | 54.167 | 55.822 | 56.898 | 57.567 | 57.593 | 46.800000000000004 | 54.167 | 55.822 | 56.898 | 57.567 | 57.593 | 46.800000000000004 | 56.54 | 59.492999999999995 | 62.07 | 65.049 | 65.666 | 46.800000000000004 | 21.133 | 14.099999999999998 | 7.84 | 0.9169999999999999 | 0.096 | 46.800000000000004 | 63.4 | 70.5 | 78.4 | 91.7 | 96.39999999999999 | | C-MTEB/MMarcoRetrieval (dev) | 66.15100000000001 | 73.26700000000001 | 74.39 | 75.048 | 75.374 | 75.386 | 68.381 | 74.1 | 75.053 | 75.644 | 75.929 | 75.93900000000001 | 68.381 | 75.3 | 77.172 | 78.669 | 80.161 | 80.46799999999999 | 68.381 | 28.299999999999997 | 17.98 | 9.48 | 1.023 | 0.105 | 66.15100000000001 | 80.318 | 84.761 | 89.238 | 96.032 | 98.437 | | C-MTEB/MedicalRetrieval (dev) | 50.1 | 55.25 | 55.93 | 56.489999999999995 | 57.007 | 57.06400000000001 | 50.3 | 55.35 | 56.03 | 56.591 | 57.108000000000004 | 57.165 | 50.1 | 56.813 | 58.044 | 59.419999999999995 | 62.28900000000001 | 63.9 | 50.1 | 20.433 | 12.86 | 6.859999999999999 | 0.828 | 0.096 | 50.1 | 61.3 | 64.3 | 68.60000000000001 | 82.8 | 95.7 |

6. Pair Classification Tasks

Dataset	Cosine Similarity Accuracy	Cosine Similarity AP	Cosine Similarity F1	Cosine Similarity Precision	Cosine Similarity Recall	Dot Accuracy	Dot AP	Dot F1	Dot Precision	Dot Recall	Euclidean Accuracy	Euclidean AP	Euclidean F1	Euclidean Precision	Euclidean Recall	Manhattan Accuracy	Manhattan AP	Manhattan F1	Manhattan Precision	Manhattan Recall	Max Accuracy	Max AP	Max F1
C-MTEB/CMNLI (validation)	77.7269993986771	86.8488070512359	79.32095490716179	72.6107226107226	87.39770867430443	77.7269993986771	86.84218333157476	79.32095490716179	72.6107226107226	87.39770867430443	77.7269993986771	86.84880910178296	79.32095490716179	72.6107226107226	87.39770867430443	77.82321106434155	86.8152244713786	79.43262411347519	72.5725338491296	87.72504091653029	77.82321106434155	86.84880910178296	79.43262411347519
C-MTEB/OCNLI (validation)	73.4163508391987	78.51058998215277	75.3875968992248	69.65085049239033	82.15417106652588	73.4163508391987	78.51058998215277	75.3875968992248	69.65085049239033	82.15417106652588	73.4163508391987	78.51058998215277	75.3875968992248	69.65085049239033	82.15417106652588	73.03735787763942	78.4190891700083	75.32592950265573	69.3950177935943	82.36536430834214	73.4163508391987	78.51058998215277	75.3875968992248

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご