gte-large-zh-GGUF Open Source Model - Free for Chinese Text Embedding and Sentence Similarity Calculation

Gte Large Zh GGUF

Developed by linlueird

This model is converted from thenlper/gte-large-zh to GGUF format via llama.cpp, primarily used for Chinese text embedding and sentence similarity calculation.

Text Embedding EnglishOpen Source License:MIT #Chinese Semantic Similarity #Multi-task Text Processing #Efficient Retrieval System

Downloads 37

Release Time : 2/14/2025

Model Overview

A GGUF format model converted from thenlper/gte-large-zh, supporting Chinese text embedding and various natural language processing tasks.

Model Features

Multi-task Evaluation Benchmark

Comprehensively evaluated on various Chinese natural language processing tasks

Sentence Embedding

Capable of generating high-quality sentence embeddings

GGUF Format Support

Supports efficient inference via the llama.cpp framework

Model Capabilities

Semantic Text Similarity Calculation

Text Classification

Information Retrieval

Re-ranking

Text Clustering

Sentence Pair Classification

Use Cases

Semantic Similarity

QA System

Calculate semantic similarity between questions and candidate answers

Achieves Pearson correlation coefficient of 48.94 on MTEB AFQMC dataset

Text Matching

Determine if two sentences express the same meaning

Achieves Pearson correlation coefficient of 69.51 on MTEB LCQMC dataset

Information Retrieval

Medical QA Retrieval

Retrieve relevant answers from medical knowledge base

Achieves MAP@10 of 36.50% on MTEB CmedqaRetrieval dataset

E-commerce Product Retrieval

Retrieve relevant products based on user queries

Achieves MAP@10 of 62.17% on MTEB EcomRetrieval dataset

Text Classification

Sentiment Analysis

Determine sentiment polarity of text

Achieves accuracy of 76.48% on MTEB multilingual sentiment analysis dataset

Intent Recognition

Identify intent categories from user input

Achieves accuracy of 73.31% on MTEB Massive Intent Classification (Chinese) dataset

🚀 gte-large-zh

This is a model based on thenlper/gte-large-zh, which has been tested on multiple tasks and datasets in the MTEB framework, demonstrating its performance in various natural language processing tasks such as sentence similarity, classification, clustering, reranking, and retrieval.

📚 Documentation

Model Information

Property	Details
Base Model	thenlper/gte-large-zh
License	MIT
Tags	mteb, sentence-similarity, sentence-transformers, Sentence Transformers, llama-cpp, gguf-my-repo

Model Performance Results

1. STS Task Results

Dataset	cos_sim_pearson	cos_sim_spearman	euclidean_pearson	euclidean_spearman	manhattan_pearson	manhattan_spearman
MTEB AFQMC (validation)	48.94131905219026	54.58261199731436	52.73929210805982	54.582632097533676	52.73123295724949	54.572941830465794
MTEB ATEC (test)	47.292931669579005	54.601019783506466	54.61393532658173	54.60101865708542	54.59369555606305	54.601098593646036
MTEB BQ (test)	62.55033151404683	64.40573802644984	62.93453281081951	64.40574149035828	62.839969210895816	64.30837945045283
MTEB LCQMC (test)	69.50947272908907	74.40054474949213	73.53007373987617	74.40054474732082	73.51396571849736	74.38395696630835
MTEB PAWSX (test)	35.301730226895955	38.54612530948101	39.02831131230217	38.54612530948101	39.04765584936325	38.54455759013173
MTEB QBQTC (test)	32.27907454729754	33.35945567162729	31.997628193815725	33.3592386340529	31.97117833750544	33.30857326127779
MTEB STS22 (zh) (test)	62.53712784446981	62.975074386224286	61.791207731290854	62.975073716988064	-	-

2. Classification Task Results

Dataset	Accuracy	F1	AP
MTEB AmazonReviewsClassification (zh) (test)	47.233999999999995	45.68998446563349	-
MTEB IFlyTek (validation)	49.60369372835706	38.24016248875209	-
MTEB JDReview (test)	86.71669793621012	81.59033162805417	55.75807094995178
MTEB MassiveIntentClassification (zh - CN) (test)	73.30531271015468	70.88091430578575	-
MTEB MassiveScenarioClassification (zh - CN) (test)	75.7128446536651	75.06125593532262	-
MTEB MultilingualSentiment (validation)	76.47666666666667	76.4808576632057	-
MTEB OnlineShopping (test)	92.68	92.67153098230253	90.78652757815115

3. Clustering Task Results

Dataset	v_measure
MTEB CLSClusteringP2P (test)	42.098169316685045
MTEB CLSClusteringS2S (test)	38.90716707051822

4. Reranking Task Results

Dataset	MAP	MRR
MTEB CMedQAv1 (test)	86.09191911031553	88.6747619047619
MTEB CMedQAv2 (test)	86.45781885502122	89.01591269841269
MTEB MMarcoReranking (dev)	31.188333827724108	29.84801587301587

5. Retrieval Task Results

| Dataset | map_at_1 | map_at_10 | map_at_100 | map_at_1000 | map_at_3 | map_at_5 | mrr_at_1 | mrr_at_10 | mrr_at_100 | mrr_at_1000 | mrr_at_3 | mrr_at_5 | ndcg_at_1 | ndcg_at_10 | ndcg_at_100 | ndcg_at_1000 | ndcg_at_3 | ndcg_at_5 | precision_at_1 | precision_at_10 | precision_at_100 | precision_at_1000 | precision_at_3 | precision_at_5 | recall_at_1 | recall_at_10 | recall_at_100 | recall_at_1000 | recall_at_3 | recall_at_5 | | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | | MTEB CmedqaRetrieval (dev) | 24.215 | 36.498000000000005 | 38.409 | 38.524 | 32.428000000000004 | 34.664 | 36.834 | 45.196 | 46.214 | 46.259 | 42.631 | 44.044 | 36.834 | 43.146 | 50.632999999999996 | 52.608999999999995 | 37.851 | 40.005 | 36.834 | 9.647 | 1.574 | 0.183 | 21.48 | 15.649 | 24.215 | 54.079 | 84.943 | 98.098 | 38.117000000000004 | 44.775999999999996 | | MTEB CovidRetrieval (dev) | 78.583 | 85.613 | 85.777 | 85.77900000000001 | 84.58 | 85.22800000000001 | 78.925 | 85.667 | 85.822 | 85.824 | 84.651 | 85.299 | 78.925 | 88.405 | 89.02799999999999 | 89.093 | 86.393 | 87.5 | 78.925 | 9.789 | 1.005 | 0.101 | 30.769000000000002 | 19.031000000000002 | 78.583 | 96.891 | 99.473 | 100.0 | 91.438 | 94.152 | | MTEB DuRetrieval (dev) | 25.604 | 77.171 | 80.033 | 80.099 | 54.364000000000004 | 68.024 | 89.85 | 93.009 | 93.065 | 93.068 | 92.72500000000001 | 92.915 | 89.85 | 85.038 | 88.247 | 88.837 | 85.20299999999999 | 83.47 | 89.85 | 40.275 | 4.709 | 0.486 | 76.36699999999999 | 63.75999999999999 | 25.604 | 85.423 | 95.695 | 98.669 | 56.737 | 72.646 | | MTEB EcomRetrieval (dev) | 51.800000000000004 | 62.17 | 62.649 | 62.663000000000004 | 59.699999999999996 | 61.23499999999999 | 51.800000000000004 | 62.17 | 62.649 | 62.663000000000004 | 59.699999999999996 | 61.23499999999999 | 51.800000000000004 | 67.246 | 69.58 | 69.925 | 62.197 | 64.981 | 51.800000000000004 | 8.32 | 0.941 | 0.097 | 23.133 | 15.24 | 51.800000000000004 | 83.2 | 94.1 | 96.8 | 69.39999999999999 | 76.2 | | MTEB MMarcoRetrieval (dev) | 64.685 | 73.803 | 74.153 | 74.167 | 71.98 | 73.21600000000001 | 66.891 | 74.48700000000001 | 74.788 | 74.801 | 72.918 | 73.965 | 66.891 | 77.534 | 79.106 | 79.494 | 74.13499999999999 | 76.20700000000001 | 66.891 | 9.375 | 1.0170000000000001 | 0.105 | 27.932000000000002 | 17.86 | 64.685 | 88.298 | 95.426 | 98.48700000000001 | 79.44200000000001 | 84.358 | | MTEB MedicalRetrieval (dev) | 52.7 | 59.532 | 60.085 | 60.126000000000005 | 57.767 | 58.952000000000005 | 52.900000000000006 | 59.648999999999994 | 60.20100000000001 | 60.242 | 57.882999999999996 | 59.068 | 52.7 | 62.883 | 65.714 | 66.932 | 59.34700000000001 | 61.486 | 52.7 | 7.340000000000001 | 0.8699999999999999 | 0.097 | 21.3 | 13.819999999999999 | 52.7 | 73.4 | 87.0 | 96.8 | 63.9 | 69.1 |

5. Pair - Classification Task Results

Dataset	cos_sim_accuracy	cos_sim_ap	cos_sim_f1	cos_sim_precision	cos_sim_recall	dot_accuracy	dot_ap	dot_f1	dot_precision	dot_recall	euclidean_accuracy	euclidean_ap	euclidean_f1	euclidean_precision	euclidean_recall	manhattan_accuracy	manhattan_ap	manhattan_f1	manhattan_precision	manhattan_recall	max_accuracy	max_ap	max_f1
MTEB Cmnli (validation)	82.51352976548407	89.49905141462749	83.89334489486234	78.19761567993534	90.48398410100538	82.51352976548407	89.49108293121158	83.89334489486234	78.19761567993534	90.48398410100538	82.51352976548407	89.49904709975154	83.89334489486234	78.19761567993534	90.48398410100538	82.48947684906794	89.49231995962901	83.84681215233205	77.28258726089528	91.62964694879588	82.51352976548407	89.49905141462749	83.89334489486234
MTEB Ocnli (validation)	77.58527341635084	79.32131557636497	80.51948051948052	71.7948717948718	91.65786694825766	77.58527341635084	79.32131557636497	80.51948051948052	71.7948717948718	91.65786694825766	77.58527341635084	79.32131557636497	80.51948051948052	71.7948717948718	91.65786694825766	77.15213860314023	79.26178519246496	80.22028453418999	70.94155844155844	92.29144667370645	77.58527341635084	79.32131557636497	80.51948051948052

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご