Mini-GTE Open-Source Lightweight Sentence Embedding Model - Free for Various Text Processing Tasks

Home

Mini Gte

Developed by prdev

A lightweight sentence embedding model based on DistilBERT, suitable for various text processing tasks

Text Embedding

Safetensors

EnglishOpen Source License:Apache-2.0 #Text Classification #Semantic Retrieval #E-commerce Review Analysis

Downloads 1,240

Release Time : 1/29/2025

Model Overview

mini-gte is a lightweight sentence embedding model based on the DistilBERT architecture, primarily designed for natural language processing tasks such as text classification, information retrieval, and clustering. The model demonstrates excellent performance across multiple MTEB benchmarks, making it particularly suitable for scenarios requiring efficient text representation.

Model Features

Lightweight and Efficient

Based on the DistilBERT architecture, significantly reducing model size while maintaining performance

Multi-task Support

Performs well in various tasks including text classification, information retrieval, and clustering

Excellent Benchmark Performance

Achieves competitive results across multiple MTEB benchmarks

Model Capabilities

Text Classification

Information Retrieval

Text Clustering

Sentence Embedding Generation

Use Cases

E-commerce

Product Review Sentiment Analysis

Analyze sentiment tendencies in Amazon product reviews

Achieved 92.94% accuracy in Amazon polarity classification task

Counterfactual Review Detection

Identify counterfactual reviews on Amazon platform

Achieved 74.90% accuracy in Amazon counterfactual classification task

Academic Research

Paper Clustering

Topic clustering for arXiv papers

Achieved V-measure of 47.25 in arXiv paper clustering task

Information Retrieval

Argument Retrieval

Retrieve relevant arguments in debate datasets

Achieved NDCG@10 of 56.61 in ArguAna retrieval task

🚀 prdev/mini-gte

This is a model based on distilbert/distilbert-base-uncased using the sentence-transformers library. It has been tested on multiple datasets in the MTEB benchmark, covering various NLP tasks such as classification, retrieval, clustering, and more.

📚 Documentation

Model Information

Property	Details
Base Model	distilbert/distilbert-base-uncased
Library Name	sentence-transformers
Model Name	prdev/mini-gte

Evaluation Results

1. MTEB AmazonCounterfactualClassification (en)

Task Type: Classification | Metric | Value | |--------|-------| | Accuracy | 74.8955 | | F1 | 68.84209999999999 | | F1 Weighted | 77.1819 | | AP | 37.731500000000004 | | AP Weighted | 37.731500000000004 | | Main Score | 74.8955 |

2. MTEB AmazonPolarityClassification (default)

Task Type: Classification | Metric | Value | |--------|-------| | Accuracy | 92.9424 | | F1 | 92.9268 | | F1 Weighted | 92.9268 | | AP | 89.2255 | | AP Weighted | 89.2255 | | Main Score | 92.9424 |

3. MTEB AmazonReviewsClassification (en)

Task Type: Classification | Metric | Value | |--------|-------| | Accuracy | 53.09199999999999 | | F1 | 52.735299999999995 | | F1 Weighted | 52.735299999999995 | | Main Score | 53.09199999999999 |

4. MTEB ArguAna (default)

Task Type: Retrieval | Metric | Value | |--------|-------| | NDCG@1 | 31.791999999999998 | | NDCG@3 | 47.205999999999996 | | NDCG@5 | 51.842999999999996 | | NDCG@10 | 56.614 | | NDCG@20 | 59.211999999999996 | | NDCG@100 | 60.148999999999994 | | NDCG@1000 | 60.231 | | MAP@1 | 31.791999999999998 | | MAP@3 | 43.35 | | MAP@5 | 45.928000000000004 | | MAP@10 | 47.929 | | MAP@20 | 48.674 | | MAP@100 | 48.825 | | MAP@1000 | 48.827999999999996 | | Recall@1 | 31.791999999999998 | | Recall@3 | 58.392999999999994 | | Recall@5 | 69.63000000000001 | | Recall@10 | 84.211 | | Recall@20 | 94.23899999999999 | | Recall@100 | 99.004 | | Recall@1000 | 99.644 | | Precision@1 | 31.791999999999998 | | Precision@3 | 19.464000000000002 | | Precision@5 | 13.926 | | Precision@10 | 8.421 | | Precision@20 | 4.712000000000001 | | Precision@100 | 0.9900000000000001 | | Precision@1000 | 0.1 | | MRR@1 | 32.4324 | | MRR@3 | 43.6463 | | MRR@5 | 46.1569 | | MRR@10 | 48.1582 | | MRR@20 | 48.9033 | | MRR@100 | 49.0537 | | MRR@1000 | 49.0569 | | NAUC NDCG@1 Max | -4.8705 | | NAUC NDCG@1 Std | -9.1757 | | NAUC NDCG@1 Diff1 | 17.743000000000002 | | NAUC NDCG@3 Max | -3.916 | | NAUC NDCG@3 Std | -10.424 | | NAUC NDCG@3 Diff1 | 12.3928 | | NAUC NDCG@5 Max | -2.5090000000000003 | | NAUC NDCG@5 Std | -10.1328 | | NAUC NDCG@5 Diff1 | 13.3086 | | NAUC NDCG@10 Max | -1.4653 | | NAUC NDCG@10 Std | -9.3154 | | NAUC NDCG@10 Diff1 | 13.7827 | | NAUC NDCG@20 Max | -2.4534000000000002 | | NAUC NDCG@20 Std | -9.0213 | | NAUC NDCG@20 Diff1 | 13.764399999999998 | | NAUC NDCG@100 Max | -2.8207 | | NAUC NDCG@100 Std | -9.0492 | | NAUC NDCG@100 Diff1 | 14.3422 | | NAUC NDCG@1000 Max | -3.0108 | | NAUC NDCG@1000 Std | -9.2507 | | NAUC NDCG@1000 Diff1 | 14.2345 | | NAUC MAP@1 Max | -4.8705 | | NAUC MAP@1 Std | -9.1757 | | NAUC MAP@1 Diff1 | 17.743000000000002 | | NAUC MAP@3 Max | -4.2874 | | NAUC MAP@3 Std | -10.1539 | | NAUC MAP@3 Diff1 | 13.6101 | | NAUC MAP@5 Max | -3.5856 | | NAUC MAP@5 Std | -9.9657 | | NAUC MAP@5 Diff1 | 14.1354 | | NAUC MAP@10 Max | -3.2553 | | NAUC MAP@10 Std | -9.6771 | | NAUC MAP@10 Diff1 | 14.402899999999999 | | NAUC MAP@20 Max | -3.5541000000000005 | | NAUC MAP@20 Std | -9.6286 | | NAUC MAP@20 Diff1 | 14.3927 | | NAUC MAP@100 Max | -3.5811999999999995 | | NAUC MAP@100 Std | -9.6278 | | NAUC MAP@100 Diff1 | 14.4922 | | NAUC MAP@1000 Max | -3.5881000000000003 | | NAUC MAP@1000 Std | -9.6335 | | NAUC MAP@1000 Diff1 | 14.488400000000002 | | NAUC Recall@1 Max | -4.8705 | | NAUC Recall@1 Std | -9.1757 | | NAUC Recall@1 Diff1 | 17.743000000000002 | | NAUC Recall@3 Max | -2.7195 | | NAUC Recall@3 Std | -11.2342 | | NAUC Recall@3 Diff1 | 8.7116 | | NAUC Recall@5 Max | 1.7492 | | NAUC Recall@5 Std | -10.6963 | | NAUC Recall@5 Diff1 | 10.569 | | NAUC Recall@10 Max | 10.7433 | | NAUC Recall@10 Std | -6.339599999999999 | | NAUC Recall@10 Diff1 | 10.6275 | | NAUC Recall@20 Max | 14.802499999999998 | | NAUC Recall@20 Std | 3.9196 | | NAUC Recall@20 Diff1 | 6.0286 | | NAUC Recall@100 Max | 40.8859 | | NAUC Recall@100 Std | 57.965500000000006 | | NAUC Recall@100 Diff1 | 30.7703 | | NAUC Recall@1000 Max | 24.2175 | | NAUC Recall@1000 Std | 70.9234 | | NAUC Recall@1000 Diff1 | 5.9272 | | NAUC Precision@1 Max | -4.8705 | | NAUC Precision@1 Std | -9.1757 | | NAUC Precision@1 Diff1 | 17.743000000000002 | | NAUC Precision@3 Max | -2.7195 | | NAUC Precision@3 Std | -11.2342 | | NAUC Precision@3 Diff1 | 8.7116 | | NAUC Precision@5 Max | 1.7492 | | NAUC Precision@5 Std | -10.6963 | | NAUC Precision@5 Diff1 | 10.569 | | NAUC Precision@10 Max | 10.7433 | | NAUC Precision@10 Std | -6.339599999999999 | | NAUC Precision@10 Diff1 | 10.6275 | | NAUC Precision@20 Max | 14.802499999999998 | | NAUC Precision@20 Std | 3.9196 | | NAUC Precision@20 Diff1 | 6.0286 | | NAUC Precision@100 Max | 40.8859 | | NAUC Precision@100 Std | 57.965500000000006 | | NAUC Precision@100 Diff1 | 30.7703 | | NAUC Precision@1000 Max | 24.2175 | | NAUC Precision@1000 Std | 70.9234 | | NAUC Precision@1000 Diff1 | 5.9272 | | NAUC MRR@1 Max | -5.1491 | | NAUC MRR@1 Std | -8.8127 | | NAUC MRR@1 Diff1 | 15.857099999999999 | | NAUC MRR@3 Max | -5.083200000000001 | | NAUC MRR@3 Std | -9.8967 | | NAUC MRR@3 Diff1 | 11.9042 | | NAUC MRR@5 Max | -4.530399999999999 | | NAUC MRR@5 Std | -9.900599999999999 | | NAUC MRR@5 Diff1 | 12.2957 | | NAUC MRR@10 Max | -4.2387 | | NAUC MRR@10 Std | -9.6123 | | NAUC MRR@10 Diff1 | 12.4769 | | NAUC MRR@20 Max | -4.5254 | | NAUC MRR@20 Std | -9.5502 | | NAUC MRR@20 Diff1 | 12.4674 | | NAUC MRR@100 Max | -4.5576 | | NAUC MRR@100 Std | -9.549100000000001 | | NAUC MRR@100 Diff1 | 12.556899999999999 | | NAUC MRR@1000 Max | -4.5645999999999995 | | NAUC MRR@1000 Std | -9.5548 | | NAUC MRR@1000 Diff1 | 12.552900000000001 | | Main Score | 56.614 |

5. MTEB ArxivClusteringP2P (default)

Task Type: Clustering | Metric | Value | |--------|-------| | V-Measure | 47.2524 | | V-Measure Std | 13.7772 | | Main Score | 47.2524 |

6. MTEB ArxivClusteringS2S (default)

Task Type: Clustering | Metric | Value | |--------|-------| | V-Measure | 40.7262 | | V-Measure Std | 14.125499999999999 | | Main Score | 40.7262 |

7. MTEB AskUbuntuDupQuestions (default)

Task Type: Reranking | Metric | Value | |--------|-------| | MAP | 61.57319999999999 | | MRR | 74.6714 | | nAUC MAP Max | 21.8916 | | nAUC MAP Std | 17.9941 | | nAUC MAP Diff1 | 1.5548 | | nAUC MRR Max | 34.139399999999995 | | nAUC MRR Std | 18.133499999999998 | | nAUC MRR Diff1 | 13.3597 | | Main Score | 61.57319999999999 |

8. MTEB BIOSSES (default)

Task Type: STS | Metric | Value | |--------|-------| | Pearson | 86.7849 | | Spearman | 84.7302 | | Cosine Pearson | 86.7849 | | Cosine Spearman | 84.7302 | | Manhattan Pearson | 84.48179999999999 | | Manhattan Spearman | 84.0507 | | Euclidean Pearson | 84.8613 | | Euclidean Spearman | 84.6266 | | Main Score | 84.7302 |

9. MTEB Banking77Classification (default)

Task Type: Classification | Metric | Value | |--------|-------| | Accuracy | 85.7175 | | F1 | 85.6781 | | F1 Weighted | 85.6781 | | Main Score | 85.7175 |

10. MTEB BiorxivClusteringP2P (default)

Task Type: Clustering | Metric | Value | |--------|-------| | V-Measure | 40.0588 | | V-Measure Std | 0.8872 | | Main Score | 40.0588 |

11. MTEB BiorxivClusteringS2S (default)

Task Type: Clustering | Metric | Value | |--------|-------| | V-Measure | 36.382799999999996 | | V-Measure Std | 1.167 | | Main Score | 36.382799999999996 |

12. MTEB CQADupstackAndroidRetrieval (default)

Task Type: Retrieval | Metric | Value | |--------|-------| | NDCG@1 | 37.196 | | NDCG@3 | 42.778 | | NDCG@5 | 45.013999999999996 | | NDCG@10 | 47.973 | | NDCG@20 | 50.141000000000005 | | NDCG@100 | 53.31399999999999 | | NDCG@1000 | 55.52 | | MAP@1 | 30.598 | | MAP@3 | 38.173 | | MAP@5 | 40.093 | | MAP@10 | 41.686 | | MAP@20 | 42.522 | | MAP@100 | 43.191 | | MAP@1000 | 43.328 | | Recall@1 | 30.598 | | Recall@3 | 45.019999999999996 | | Recall@5 | 51.357 | | Recall@10 | 60.260000000000005 | | Recall@20 | 67.93299999999999 | | Recall@100 | 82.07 | | Recall@1000 | 96.345 | | Precision@1 | 37.196 | | Precision@3 | 20.552999999999997 | | Precision@5 | 14.707 | | Precision@10 | 9.213000000000001 | | Precision@20 | 5.522 | | Precision@100 | 1.4949999999999999 | | Precision@1000 | 0.198 | | MRR@1 | 37.196 | | MRR@3 | 44.4683 | | MRR@5 | 45.9776 | | MRR@10 | 47.1884 | | MRR@20 | 47.6763 | | MRR@100 | 47.957 | | MRR@1000 | 48.0103 | | NAUC NDCG@1 Max | 38.1056 | | NAUC NDCG@1 Std | -1.5731 | | NAUC NDCG@1 Diff1 | 52.3965 | | NAUC NDCG@3 Max | 35.8655 | | NAUC NDCG@3 Std | 0.2057 | | NAUC NDCG@3 Diff1 | 46.299600000000005 | | NAUC NDCG@5 Max | 36.3806 | | NAUC NDCG@5 Std | 1.542 | | NAUC NDCG@5 Diff1 | 45.3674 | | NAUC NDCG@10 Max | 36.6053 | | NAUC NDCG@10 Std | 2.7934 | | NAUC NDCG@10 Diff1 | 45.3474 | | NAUC NDCG@20 Max | 37.2333 | | NAUC NDCG@20 Std | 3.3346 | | NAUC NDCG@20 Diff1 | 45.6105 | | NAUC NDCG@100 Max | 38.168400000000005 | | NAUC NDCG@100 Std | 4.618 | | NAUC NDCG@100 Diff1 | 45.7041 | | NAUC NDCG@1000 Max | 37.911 | | NAUC NDCG@1000 Std | 4.2068 | | NAUC NDCG@1000 Diff1 | 46.0349 | | NAUC MAP@1 Max | 33.6794 | | NAUC MAP@1 Std | -0.7946 | | NAUC MAP@1 Diff1 | 55.799699999999994 | | NAUC MAP@3 Max | 35.216300000000004 | | NAUC MAP@3 Std | -0.3286 | | NAUC MAP@3 Diff1 | 49.5727 | | NAUC MAP@5 Max | 35.583999999999996 | | NAUC MAP@5 Std | 0.4626 | | NAUC MAP@5 Diff1 | 48.621900000000004 | | NAUC MAP@10 Max | 35.837 | | NAUC MAP@10 Std | 1.1462999999999999 | | NAUC MAP@10 Diff1 | 48.302499999999995 | | NAUC MAP@20 Max | 36.1877 | | NAUC MAP@20 Std | 1.5263 | | NAUC MAP@20 Diff1 | 48.2105 | | NAUC MAP@100 Max | 36.452 | | NAUC MAP@100 Std | 1.958 | | NAUC MAP@100 Diff1 | 48.1781 | | NAUC MAP@1000 Max | 36.4422 | | NAUC MAP@1000 Std | 1.9560000000000002 | | NAUC MAP@1000 Diff1 | 48.166399999999996 | | NAUC Recall@1 Max | 33.6794 | | NAUC Recall@1 Std | -0.7946 | | NAUC Recall@1 Diff1 | 55.799699999999994 | | NAUC Recall@3 Max | 33.591 | | NAUC Recall@3 Std | 0.7802 | | NAUC Recall@3 Diff1 | 42.728100000000005 | | NAUC Recall@5 Max | 34.1456 | | NAUC Recall@5 Std | 3.803 | | NAUC Recall@5 Diff1 | 39.3889 | | NAUC Recall@10 Max | 34.2228 | | NAUC Recall@10 Std | 7.394399999999999 | | NAUC Recall@10 Diff1 | 37.660900000000005 | | NAUC Recall@20 Max | 35.9338 | | NAUC Recall@20 Std | 9.6754 | | NAUC Recall@20 Diff1 | 36.626999999999995 | | NAUC Recall@100 Max | 43.0721 | | NAUC Recall@100 Std | 21.493499999999997 | | NAUC Recall@100 Diff1 | 34.809 | | NAUC Recall@1000 Max | 61.345499999999994 | | NAUC Recall@1000 Std | 66.2789 | | NAUC Recall@1000 Diff1 | 43.5024 | | NAUC Precision@1 Max | 38.1056 | | NAUC Precision@1 Std | -1.5731 | | NAUC Precision@1 Diff1 | 52.3965 | | NAUC Precision@3 Max | 31.2978 | | NAUC Precision@3 Std | 0.0904 | | NAUC Precision@3 Diff1 | 25.9668 | | NAUC Precision@5 Max | 28.2209 | | NAUC Precision@5 Std | 3.6561000000000003 | | NAUC Precision@5 Diff1 | 16.3544 | | NAUC Precision@10 Max | 21.8709 | | NAUC Precision@10 Std | 7.3919 | | NAUC Precision@10 Diff1 | 4.4909 | | NAUC Precision@20 Max | 16.3885 | | NAUC Precision@20 Std | 9.8527 | | NAUC Precision@20 Diff1 | -3.9433000000000002 | | NAUC Precision@100 Max | 4.612 | | NAUC Precision@100 Std | 6.9627 | | NAUC Precision@100 Diff1 | -14.0135 | | NAUC Precision@1000 Max | -10.599699999999999 | | NAUC Precision@1000 Std | -4.5693 | | NAUC Precision@1000 Diff1 | -21.0926 | | NAUC MRR@1 Max | 38.1056 | | NAUC MRR@1 Std | -1.5731 | | NAUC MRR@1 Diff1 | 52.3965 | | NAUC MRR@3 Max | 37.4199 | | NAUC MRR@3 Std | -0.5046 | | NAUC MRR@3 Diff1 | 46.5936 | | NAUC MRR@5 Max | 38.1046 | | NAUC MRR@5 Std | -0.284 | | NAUC MRR@5 Diff1 | 46.795 | | NAUC MRR@10 Max | 38.1046 | | NAUC MRR@10 Std | -0.284 | | NAUC MRR@10 Diff1 | 46.795 | | NAUC MRR@20 Max | 38.1046 | | NAUC MRR@20 Std | -0.284 | | NAUC MRR@20 Diff1 | 46.795 | | NAUC MRR@100 Max | 38.1046 | | NAUC MRR@100 Std | -0.284 | | NAUC MRR@100 Diff1 | 46.795 | | NAUC MRR@1000 Max | 38.1046 | | NAUC MRR@1000 Std | -0.284 | | NAUC MRR@1000 Diff1 | 46.795 | | Main Score | 47.973 |

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご