SBERT Chinese General V2 Distill Open-Source Model - Accelerate General Chinese Semantic Matching with Excellent Performance and High Speed!

Sbert Chinese General V2 Distill

Developed by DMetaSoul

This is a Chinese sentence embedding model suitable for general semantic matching scenarios. Through distillation technology, it has been reduced from a 12-layer BERT to a 4-layer model, significantly improving inference speed while maintaining good performance.

Text Embedding

Transformers

#Lightweight Semantic Matching #Chinese General Scenario #Distilled BERT

Downloads 43

Release Time : 4/2/2022

Model Overview

This model is a distilled version of a general semantic matching model, suitable for tasks such as semantic similarity calculation, feature extraction, and semantic search for Chinese text.

Model Features

Efficient Inference

Compared to the original 12-layer BERT model, the parameter count is reduced by 44%, latency is reduced by approximately 47%, and throughput is increased by nearly 2 times.

Strong Generalization

Demonstrates good generalization ability across various semantic matching tasks.

Lightweight Design

Model compression achieved through distillation technology, making it more suitable for production environment deployment.

Model Capabilities

Text vectorization

Semantic similarity calculation

Semantic search

Feature extraction

Use Cases

Text Matching

Q&A Systems

Used to match user questions with candidate answers in a knowledge base.

Information Retrieval

Calculates the semantic relevance between queries and documents.

Text Clustering

Similar Text Grouping

Performs clustering analysis on texts based on semantic similarity.

🚀 DMetaSoul/sbert-chinese-general-v2-distill

This model is a distilled version (only 4-layer BERT) of the previously open-source general semantic matching model. It is suitable for general semantic matching scenarios. In terms of performance, this model has better generalization ability and faster encoding speed on various tasks.

If a pre-trained large model is directly used for online inference, it has strict requirements on computing resources and is difficult to meet the performance indicators such as latency and throughput in the business environment. Here, we use distillation to lightweight the large model. After distilling from 12-layer BERT to 4-layer BERT, the number of model parameters is reduced to 44%, the latency is approximately halved, the throughput is doubled, and the accuracy drops by about 6% (see the evaluation section below for specific results).

🚀 Quick Start

✨ Features

A distilled version of the previous open-source general semantic matching model, suitable for general semantic matching scenarios.
Better generalization ability and faster encoding speed on various tasks.
After distillation, the model parameters are reduced, latency is halved, throughput is doubled, with a small drop in accuracy.

📦 Installation

You can install the necessary libraries through the following command:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

You can use the model through the sentence-transformers framework. Here is an example of loading the model and extracting text representation vectors:

from sentence_transformers import SentenceTransformer
sentences = ["我的儿子！他猛然间喊道，我的儿子在哪儿？", "我的儿子呢！他突然喊道，我的儿子在哪里？"]

model = SentenceTransformer('DMetaSoul/sbert-chinese-general-v2-distill')
embeddings = model.encode(sentences)
print(embeddings)

Advanced Usage

If you don't want to use sentence-transformers, you can also load the model and extract text vectors through HuggingFace Transformers:

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ["我的儿子！他猛然间喊道，我的儿子在哪儿？", "我的儿子呢！他突然喊道，我的儿子在哪里？"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('DMetaSoul/sbert-chinese-general-v2-distill')
model = AutoModel.from_pretrained('DMetaSoul/sbert-chinese-general-v2-distill')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 Documentation

Evaluation

Here is a comparison with the corresponding teacher model before distillation:

Performance:

	Teacher	Student	Gap
Model	BERT-12-layers (102M)	BERT-4-layers (45M)	0.44x
Cost	23s	12s	-47%
Latency	38ms	20ms	-47%
Throughput	418 sentence/s	791 sentence/s	1.9x

Accuracy:

	csts_dev	csts_test	afqmc	lcqmc	bqcorpus	pawsx	xiaobu	Avg
Teacher	77.19%	72.59%	36.79%	76.91%	49.62%	16.24%	63.15%	56.07%
Student	76.49%	73.33%	26.46%	64.26%	46.02%	11.83%	52.45%	50.12%
Gap (abs.)	-	-	-	-	-	-	-	-5.95%

Tested on 10,000 data, GPU device is V100, batch_size=16, max_seq_len=256

📄 License

E-mail: xiaowenbin@dmetasoul.com

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご