๐ DMetaSoul/sbert-chinese-general-v2-distill
This model is a distilled version (only 4-layer BERT) of the previously open-source general semantic matching model. It is suitable for general semantic matching scenarios. In terms of performance, this model has better generalization ability and faster encoding speed on various tasks.
If a pre-trained large model is directly used for online inference, it has strict requirements on computing resources and is difficult to meet the performance indicators such as latency and throughput in the business environment. Here, we use distillation to lightweight the large model. After distilling from 12-layer BERT to 4-layer BERT, the number of model parameters is reduced to 44%, the latency is approximately halved, the throughput is doubled, and the accuracy drops by about 6% (see the evaluation section below for specific results).
๐ Quick Start
โจ Features
- A distilled version of the previous open-source general semantic matching model, suitable for general semantic matching scenarios.
- Better generalization ability and faster encoding speed on various tasks.
- After distillation, the model parameters are reduced, latency is halved, throughput is doubled, with a small drop in accuracy.
๐ฆ Installation
You can install the necessary libraries through the following command:
pip install -U sentence-transformers
๐ป Usage Examples
Basic Usage
You can use the model through the sentence-transformers framework. Here is an example of loading the model and extracting text representation vectors:
from sentence_transformers import SentenceTransformer
sentences = ["ๆ็ๅฟๅญ๏ผไป็็ถ้ดๅ้๏ผๆ็ๅฟๅญๅจๅชๅฟ๏ผ", "ๆ็ๅฟๅญๅข๏ผไป็ช็ถๅ้๏ผๆ็ๅฟๅญๅจๅช้๏ผ"]
model = SentenceTransformer('DMetaSoul/sbert-chinese-general-v2-distill')
embeddings = model.encode(sentences)
print(embeddings)
Advanced Usage
If you don't want to use sentence-transformers, you can also load the model and extract text vectors through HuggingFace Transformers:
from transformers import AutoTokenizer, AutoModel
import torch
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ["ๆ็ๅฟๅญ๏ผไป็็ถ้ดๅ้๏ผๆ็ๅฟๅญๅจๅชๅฟ๏ผ", "ๆ็ๅฟๅญๅข๏ผไป็ช็ถๅ้๏ผๆ็ๅฟๅญๅจๅช้๏ผ"]
tokenizer = AutoTokenizer.from_pretrained('DMetaSoul/sbert-chinese-general-v2-distill')
model = AutoModel.from_pretrained('DMetaSoul/sbert-chinese-general-v2-distill')
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
๐ Documentation
Evaluation
Here is a comparison with the corresponding teacher model before distillation:
Performance:
|
Teacher |
Student |
Gap |
Model |
BERT-12-layers (102M) |
BERT-4-layers (45M) |
0.44x |
Cost |
23s |
12s |
-47% |
Latency |
38ms |
20ms |
-47% |
Throughput |
418 sentence/s |
791 sentence/s |
1.9x |
Accuracy:
|
csts_dev |
csts_test |
afqmc |
lcqmc |
bqcorpus |
pawsx |
xiaobu |
Avg |
Teacher |
77.19% |
72.59% |
36.79% |
76.91% |
49.62% |
16.24% |
63.15% |
56.07% |
Student |
76.49% |
73.33% |
26.46% |
64.26% |
46.02% |
11.83% |
52.45% |
50.12% |
Gap (abs.) |
- |
- |
- |
- |
- |
- |
- |
-5.95% |
Tested on 10,000 data, GPU device is V100, batch_size=16, max_seq_len=256
๐ License
E-mail: xiaowenbin@dmetasoul.com