Open-source gte-Qwen2-1.5B-instruct-4bit-dwq model - Facilitating Chinese and English sentence similarity calculation and text retrieval

Gte Qwen2 1.5B Instruct 4bit Dwq

Developed by mlx-community

A 1.5B-parameter general text embedding model based on the Qwen2 architecture, supporting both Chinese and English, focusing on sentence similarity computation and text retrieval tasks.

Text Embedding

Transformers

Open Source License:Apache-2.0 #Multitask Text Embedding #High-precision Semantic Similarity #Large-scale Retrieval Optimization

Downloads 22

Release Time : 5/17/2025

Model Overview

This general text embedding model, developed by Alibaba NLP team, is based on the Qwen2 architecture and supports both Chinese and English. It is mainly used for tasks such as sentence similarity computation, text classification, and retrieval.

Model Features

Powerful Text Embedding Capability

Outstanding performance on multiple MTEB benchmarks, especially in text classification and retrieval tasks.

Bilingual Support

Supports both Chinese and English text processing.

Multitask Adaptability

Applicable to various NLP tasks such as sentence similarity, classification, clustering, and retrieval.

Model Capabilities

Sentence similarity computation

Text classification

Text retrieval

Text clustering

Text re-ranking

Use Cases

E-commerce

Product Review Sentiment Analysis

Analyzing sentiment tendencies of Amazon product reviews

Achieved 96.61% accuracy on MTEB AmazonPolarityClassification

Product Classification

Classifying product descriptions

Achieved 83.99% accuracy on MTEB AmazonCounterfactualClassification

Finance

Bank Customer Service Issue Classification

Automatically classifying bank customer issues

Achieved 87.31% accuracy on MTEB Banking77Classification

Academic Research

Paper Clustering

Topic clustering for arXiv and biorxiv papers

Achieved V-measure of 50.51 on MTEB ArxivClusteringP2P

🚀 gte-qwen2-7B-instruct

This is a model based on Alibaba-NLP/gte-Qwen2-1.5B-instruct, which has been tested on multiple NLP tasks in the MTEB dataset, demonstrating excellent performance in classification, retrieval, clustering, and other tasks.

✨ Features

Multiple Task Support: Supports various NLP tasks such as classification, retrieval, clustering, reranking, and STS.
Rich Dataset Testing: Tested on multiple datasets in the MTEB dataset, including Amazon-related classification datasets, retrieval datasets like ArguAna, and clustering datasets like ArxivClustering.

📚 Documentation

Model Information

Property	Details
Model Type	gte-qwen2-7B-instruct
Base Model	Alibaba-NLP/gte-Qwen2-1.5B-instruct
Library Name	mlx
License	apache-2.0
Pipeline Tag	text-generation

Performance Metrics

The model has been tested on multiple tasks and datasets in the MTEB dataset, and the following are the performance metrics:

Classification Tasks

Dataset	Accuracy	AP	F1
MTEB AmazonCounterfactualClassification (en)	83.98507462686567	50.93015252587014	78.50416599051215
MTEB AmazonPolarityClassification	96.61065	94.89174052954196	96.60942596940565
MTEB AmazonReviewsClassification (en)	55.614000000000004	-	54.90553480294904
MTEB Banking77Classification	87.30844155844156	-	87.25307322443255

Retrieval Tasks

Dataset	MAP@1	MAP@10	MAP@100	MAP@1000	MRR@1	MRR@10	MRR@100	MRR@1000	NDCG@1	NDCG@10	NDCG@100	NDCG@1000	Precision@1	Precision@10	Precision@100	Precision@1000	Recall@1	Recall@10	Recall@100	Recall@1000	Recall@3	Recall@5
MTEB ArguAna	45.164	61.519	61.769	61.769	46.088	61.861	62.117999999999995	62.117999999999995	45.164	69.72	70.719	70.719	45.164	9.545	0.996	0.1	45.164	95.448	99.644	99.644	73.329	84.851
MTEB CQADupstackAndroidRetrieval	35.423	47.198	48.899	49.004	42.918	53.299	54.032000000000004	54.055	42.918	53.98	59.57	60.879000000000005	42.918	10.299999999999999	1.687	0.211	35.423	66.824	89.564	97.501	50.365	57.921
MTEB CQADupstackEnglishRetrieval	33.205	44.859	46.135	46.259	41.146	50.621	51.207	51.246	41.146	50.683	54.82	56.69	41.146	9.439	1.465	0.194	33.205	61.028999999999996	78.152	89.59700000000001	49.05	54.836
MTEB CQADupstackGamingRetrieval	41.637	55.162	56.142	56.188	47.524	58.243	58.879999999999995	58.9	47.524	61.305	65.077	65.941	47.524	9.918000000000001	1.276	0.13899999999999998	41.637	76.185	92.149	98.199	60.856	68.25099999999999
MTEB CQADupstackGisRetrieval	26.27	37.463	38.434000000000005	38.509	28.588	39.383	40.23	40.281	28.588	43.511	48.274	49.975	28.588	6.893000000000001	0.9900000000000001	0.117	26.27	60.284000000000006	81.902	94.43	43.537	51.475
MTEB CQADupstackMathematicaRetrieval	18.168	28.410000000000004	29.78	29.892999999999997	23.507	33.382	34.404	34.467999999999996	23.507	34.571000000000005	40.663	43.236000000000004	23.507	6.654	1.113	-	18.168	-	-	-	-	-

Clustering Tasks

Dataset	V-Measure
MTEB ArxivClusteringP2P	50.511868162026175
MTEB ArxivClusteringS2S	45.007803189284004
MTEB BiorxivClusteringP2P	43.20754608934859
MTEB BiorxivClusteringS2S	38.818037697335505

Reranking Tasks

Dataset	MAP	MRR
MTEB AskUbuntuDupQuestions	64.55292107723382	77.66158818097877

STS Tasks

Dataset	Cosine Similarity Pearson	Cosine Similarity Spearman	Euclidean Pearson	Euclidean Spearman	Manhattan Pearson	Manhattan Spearman
MTEB BIOSSES	85.65459047085452	82.10729255710761	82.78079159312476	80.50002701880933	82.41372641383016	80.57412509272639

📄 License

This project is licensed under the apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご