Conan-embedding-v1 Open-source Chinese Text Embedding Model

Conan Embedding V1 Q4 K M GGUF

Developed by KenLi315

Conan-embedding-v1 is a Chinese text embedding model developed by the Tencent BAC team, focusing on semantic representation and similarity calculation for Chinese text.

Text Embedding Chinese#Chinese Text Embedding #Multi-task Evaluation #High-precision Semantic Similarity

Downloads 48

Release Time : 1/28/2025

Model Overview

This model is primarily used for semantic embedding representation of Chinese text, supporting various natural language processing tasks such as text similarity calculation, classification, clustering, and retrieval.

Model Features

Chinese Optimization

Specifically optimized for Chinese text, capable of better capturing Chinese semantic features.

Multi-task Support

Supports various natural language processing tasks, including text similarity calculation, classification, clustering, and retrieval.

High Performance

Outperforms in multiple Chinese benchmark tests, especially in semantic similarity tasks.

Model Capabilities

Text embedding

Semantic similarity calculation

Text classification

Text clustering

Information retrieval

Re-ranking

Use Cases

Information Retrieval

Medical Q&A Retrieval

Used in medical Q&A retrieval systems to help users quickly find relevant medical information.

Performs well on the CMedqaRetrieval dataset, achieving a map@100 of 42.495

Text Similarity

Q&A Pair Matching

Determines the semantic relevance between questions and answers.

Achieves a cos_sim_spearman of 74.507 on the BQ dataset

Text Classification

Product Review Classification

Performs sentiment classification on product reviews from e-commerce platforms.

Achieves an accuracy of 90.319% on the JDReview classification task

🚀 conan-embedding

This is an embedding model named conan-embedding, based on the TencentBAC/Conan-embedding-v1 base model. It uses the sentence-transformers library and has achieved certain performance metrics on multiple tasks in the MTEB benchmark.

📚 Documentation

Model Information

Property	Details
Tags	mteb, llama-cpp, gguf-my-repo
Library Name	sentence-transformers
Base Model	TencentBAC/Conan-embedding-v1
License	cc-by-nc-4.0

Model Results

The model has been evaluated on various tasks and datasets, with the following performance metrics:

1. STS (Semantic Textual Similarity) Tasks

MTEB AFQMC:
- cos_sim_pearson: 56.613572467148856
- cos_sim_spearman: 60.66446211824284
- euclidean_pearson: 58.42080485872613
- euclidean_spearman: 59.82750030458164
- manhattan_pearson: 58.39885271199772
- manhattan_spearman: 59.817749720366734
MTEB ATEC:
- cos_sim_pearson: 56.60530380552331
- cos_sim_spearman: 58.63822441736707
- euclidean_pearson: 62.18551665180664
- euclidean_spearman: 58.23168804495912
- manhattan_pearson: 62.17191480770053
- manhattan_spearman: 58.22556219601401
MTEB BQ:
- cos_sim_pearson: 72.6472074172711
- cos_sim_spearman: 74.50748447236577
- euclidean_pearson: 72.51833296451854
- euclidean_spearman: 73.9898922606105
- manhattan_pearson: 72.50184948939338
- manhattan_spearman: 73.97797921509638
MTEB LCQMC:
- cos_sim_pearson: 73.18906216730208
- cos_sim_spearman: 79.44570226735877
- euclidean_pearson: 78.8105072242798
- euclidean_spearman: 79.15605680863212
- manhattan_pearson: 78.80576507484064
- manhattan_spearman: 79.14625534068364
MTEB PAWSX:
- cos_sim_pearson: 40.01998290519605
- cos_sim_spearman: 46.5989769986853
- euclidean_pearson: 45.37905883182924
- euclidean_spearman: 46.22213849806378
- manhattan_pearson: 45.40925124776211
- manhattan_spearman: 46.250705124226386
MTEB QBQTC:
- cos_sim_pearson: 42.719516197112526
- cos_sim_spearman: 44.57507789581106
- euclidean_pearson: 35.73062264160721
- euclidean_spearman: 40.473523909913695
- manhattan_pearson: 35.69868964086357
- manhattan_spearman: 40.46349925372903
MTEB STS22 (zh):
- cos_sim_pearson: 62.340118285801104
- cos_sim_spearman: 67.72781908620632
- euclidean_pearson: 63.161965746091596
- euclidean_spearman: 67.36825684340769
- manhattan_pearson: 63.089863788261425
- manhattan_spearman: 67.40868898995384
MTEB STSB: (Metrics not fully provided in the original)

2. Classification Tasks

MTEB AmazonReviewsClassification (zh):
- accuracy: 50.308
- f1: 46.927458607895126
MTEB IFlyTek:
- accuracy: 51.94305502116199
- f1: 39.82197338426721
MTEB JDReview:
- accuracy: 90.31894934333957
- ap: 63.89821836499594
- f1: 85.93687177603624
MTEB MassiveIntentClassification (zh-CN):
- accuracy: 78.13718897108272
- f1: 74.07613180855328
MTEB MassiveScenarioClassification (zh-CN):
- accuracy: 86.20040349697376
- f1: 85.05282136519973
MTEB MultilingualSentiment:
- accuracy: 78.57666666666667
- f1: 78.23373528202681
MTEB OnlineShopping:
- accuracy: 95.06999999999998
- ap: 93.45104559324996
- f1: 95.06036329426092

3. Clustering Tasks

MTEB CLSClusteringP2P:
- v_measure: 60.63545326048343
MTEB CLSClusteringS2S:
- v_measure: 52.64834762325994

4. Reranking Tasks

MTEB CMedQAv1:
- map: 91.38528814655234
- mrr: 93.35857142857144
MTEB CMedQAv2:
- map: 89.72084678877096
- mrr: 91.74380952380953
MTEB MMarcoReranking:
- map: 41.58107192600853
- mrr: 41.37063492063492

5. Retrieval Tasks

MTEB CmedqaRetrieval:
- map_at_1: 26.987
- map_at_10: 40.675
- map_at_100: 42.495
- map_at_1000: 42.596000000000004
- map_at_3: 36.195
- map_at_5: 38.704
- mrr_at_1: 41.21
- mrr_at_10: 49.816
- mrr_at_100: 50.743
- mrr_at_1000: 50.77700000000001
- mrr_at_3: 47.312
- mrr_at_5: 48.699999999999996
- ndcg_at_1: 41.21
- ndcg_at_10: 47.606
- ndcg_at_100: 54.457
- ndcg_at_1000: 56.16100000000001
- ndcg_at_3: 42.108000000000004
- ndcg_at_5: 44.393
- precision_at_1: 41.21
- precision_at_10: 10.593
- precision_at_100: 1.609
- precision_at_1000: 0.183
- precision_at_3: 23.881
- precision_at_5: 17.339
- recall_at_1: 26.987
- recall_at_10: 58.875
- recall_at_100: 87.023
- recall_at_1000: 98.328
- recall_at_3: 42.265
- recall_at_5: 49.334
MTEB CovidRetrieval:
- map_at_1: 83.693
- map_at_10: 90.098
- map_at_100: 90.145
- map_at_1000: 90.146
- map_at_3: 89.445
- map_at_5: 89.935
- mrr_at_1: 83.878
- mrr_at_10: 90.007
- mrr_at_100: 90.045
- mrr_at_1000: 90.046
- mrr_at_3: 89.34
- mrr_at_5: 89.835
- ndcg_at_1: 84.089
- ndcg_at_10: 92.351
- ndcg_at_100: 92.54599999999999
- ndcg_at_1000: 92.561
- ndcg_at_3: 91.15299999999999
- ndcg_at_5: 91.968
- precision_at_1: 84.089
- precision_at_10: 10.011000000000001
- precision_at_100: 1.009
- precision_at_1000: 0.101
- precision_at_3: 32.28
- precision_at_5: 19.789
- recall_at_1: 83.693
- recall_at_10: 99.05199999999999
- recall_at_100: 99.895
- recall_at_1000: 100
- recall_at_3: 95.917
- recall_at_5: 97.893
MTEB DuRetrieval:
- map_at_1: 26.924
- map_at_10: 81.392
- map_at_100: 84.209
- map_at_1000: 84.237
- map_at_3: 56.998000000000005
- map_at_5: 71.40100000000001
- mrr_at_1: 91.75
- mrr_at_10: 94.45
- mrr_at_100: 94.503
- mrr_at_1000: 94.505
- mrr_at_3: 94.258
- mrr_at_5: 94.381
- ndcg_at_1: 91.75
- ndcg_at_10: 88.53
- ndcg_at_100: 91.13900000000001
- ndcg_at_1000: 91.387
- ndcg_at_3: 87.925
- ndcg_at_5: 86.461
- precision_at_1: 91.75
- precision_at_10: 42.05
- precision_at_100: 4.827
- precision_at_1000: 0.48900000000000005
- precision_at_3: 78.55
- precision_at_5: 65.82000000000001
- recall_at_1: 26.924
- recall_at_10: 89.338
- recall_at_100: 97.856
- recall_at_1000: 99.11
- recall_at_3: 59.202999999999996
- recall_at_5: 75.642
MTEB EcomRetrieval:
- map_at_1: 54.800000000000004
- map_at_10: 65.613
- map_at_100: 66.185
- map_at_1000: 66.191
- map_at_3: 62.8
- map_at_5: 64.535
- mrr_at_1: 54.800000000000004
- mrr_at_10: 65.613
- mrr_at_100: 66.185
- mrr_at_1000: 66.191
- mrr_at_3: 62.8
- mrr_at_5: 64.535
- ndcg_at_1: 54.800000000000004
- ndcg_at_10: 70.991
- ndcg_at_100: 73.434
- ndcg_at_1000: 73.587
- ndcg_at_3: 65.324
- ndcg_at_5: 68.431
- precision_at_1: 54.800000000000004
- precision_at_10: 8.790000000000001
- precision_at_100: 0.9860000000000001
- precision_at_1000: 0.1
- precision_at_3: 24.2
- precision_at_5: 16.02
- recall_at_1: 54.800000000000004
- recall_at_10: 87.9
- recall_at_100: 98.6
- recall_at_1000: 99.8
- recall_at_3: 72.6
- recall_at_5: 80.10000000000001
MTEB MMarcoRetrieval:
- map_at_1: 68.33
- map_at_10: 78.261
- map_at_100: 78.522
- map_at_1000: 78.527
- map_at_3: 76.236
- map_at_5: 77.557
- mrr_at_1: 70.602
- mrr_at_10: 78.779
- mrr_at_100: 79.00500000000001
- mrr_at_1000: 79.01
- mrr_at_3: 77.037
- mrr_at_5: 78.157
- ndcg_at_1: 70.602
- ndcg_at_10: 82.254
- ndcg_at_100: 83.319
- ndcg_at_1000: 83.449
- ndcg_at_3: 78.46
- ndcg_at_5: 80.679
- precision_at_1: 70.602
- precision_at_10: 9.989
- precision_at_100: 1.05
- precision_at_1000: 0.106
- precision_at_3: 29.598999999999997
- precision_at_5: 18.948
- recall_at_1: 68.33
- recall_at_10: 94.00800000000001
- recall_at_100: 98.589
- recall_at_1000: 99.60799999999999
- recall_at_3: 84.057
- recall_at_5: 89.32900000000001
MTEB MedicalRetrieval:
- map_at_1: 56.8
- map_at_10: 64.199
- map_at_100: 64.89
- map_at_1000: 64.917
- map_at_3: 62.383
- map_at_5: 63.378
- mrr_at_1: 56.8
- mrr_at_10: 64.199
- mrr_at_100: 64.89
- mrr_at_1000: 64.917
- mrr_at_3: 62.383
- mrr_at_5: 63.378
- ndcg_at_1: 56.8
- ndcg_at_10: 67.944
- ndcg_at_100: 71.286
- ndcg_at_1000: 71.879
- ndcg_at_3: 64.163
- ndcg_at_5: 65.96600000000001
- precision_at_1: 56.8
- precision_at_10: 7.9799999999999995
- precision_at_100: 0.954
- precision_at_1000: 0.1
- precision_at_3: 23.1
- precision_at_5: 14.74
- recall_at_1: 56.8
- recall_at_10: 79.80000000000001
- recall_at_100: 95.39999999999999
- recall_at_1000: 99.8
- recall_at_3: 69.3
- recall_at_5: 73.7

6. PairClassification Tasks

MTEB Cmnli:
- cos_sim_accuracy: 85.91701743836441
- cos_sim_ap: 92.53650618807644
- cos_sim_f1: 86.80265975431082
- cos_sim_precision: 83.79025239338556
- cos_sim_recall: 90.039747486556
- dot_accuracy: 77.17378232110643
- dot_ap: 85.40244368166546
- dot_f1: 79.03038001481951
- dot_precision: 72.20502901353966
- dot_recall: 87.2808043020809
- euclidean_accuracy: 84.65423932651834
- euclidean_ap: 91.47775530034588
- euclidean_f1: 85.64471499723298
- euclidean_precision: 81.31567885666246
- euclidean_recall: 90.46060322656068
- manhattan_accuracy: 84.58208057726999
- manhattan_ap: 91.46228709402014
- manhattan_f1: 85.6631626034444
- manhattan_precision: 82.10075026795283
- manhattan_recall: 89.5487491232172
- max_accuracy: 85.91701743836441
- max_ap: 92.53650618807644
- max_f1: 86.80265975431082
MTEB Ocnli:
- cos_sim_accuracy: 85.43584190579317
- cos_sim_ap: 90.76665640338129
- cos_sim_f1: 86.5021770682148
- cos_sim_precision: 79.82142857142858
- cos_sim_recall: 94.40337909186906
- dot_accuracy: 78.66811044937737
- dot_ap: 85.84084363880804
- dot_f1: 80.10075566750629
- dot_precision: 76.58959537572254
- dot_recall: 83.9493136219641
- euclidean_accuracy: 84.46128857606931
- euclidean_ap: 88.62351100230491
- euclidean_f1: 85.7709469509172
- euclidean_precision: 80.8411214953271
- euclidean_recall: 91.34107708553326
- manhattan_accuracy: 84.51543042772063
- manhattan_ap: 88.53975607870393
- manhattan_f1: 85.75697211155378
- manhattan_precision: 81.14985862393968
- manhattan_recall: 90.91869060190075
- max_accuracy: 85.43584190579317
- max_ap: 90.76665640338129
- max_f1: 86.5021770682148

📄 License

This model is released under the cc-by-nc-4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご