🚀 DMetaSoul/sbert-chinese-qmc-finance-v1-distill
This model is a distilled and lightweight version (only 4-layer BERT) of the previously open-source financial question matching model. It is suitable for the question matching scenario in the financial field, such as:
- 8 thousand yuan with 400 yuan in interest for 1000 days? VS How much is the daily interest for 10,000 yuan?
- Early repayment is calculated based on the full amount VS How to make a repayment when the payment fails?
- Why did my borrowing transaction fail? VS Why did the newly applied loan fail?
If a pre-trained large model is directly used for online inference, it has strict requirements on computing resources and is difficult to meet the performance indicators such as latency and throughput in the business environment. Here, we use the distillation method to lightweight the large model. After distilling from a 12-layer BERT to a 4-layer BERT, the number of model parameters is reduced to 44%, the latency is approximately halved, the throughput is doubled, and the accuracy drops by about 5% (for specific results, see the evaluation section below).
🚀 Quick Start
✨ Features
This model is a distilled and lightweight version of the previous open - source financial question matching model, suitable for financial question matching scenarios. It reduces the model size and improves performance indicators such as latency and throughput through distillation, with a slight sacrifice in accuracy.
📦 Installation
1. Sentence - Transformers
Install the necessary library through the sentence-transformers framework:
pip install -U sentence-transformers
💻 Usage Examples
Basic Usage
1. Sentence - Transformers
Use the following code to load the model and extract text representation vectors:
from sentence_transformers import SentenceTransformer
sentences = ["到期不能按时还款怎么办", "剩余欠款还有多少?"]
model = SentenceTransformer('DMetaSoul/sbert-chinese-qmc-finance-v1-distill')
embeddings = model.encode(sentences)
print(embeddings)
2. HuggingFace Transformers
If you don't want to use sentence-transformers, you can also load the model and extract text vectors through HuggingFace Transformers:
from transformers import AutoTokenizer, AutoModel
import torch
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ["到期不能按时还款怎么办", "剩余欠款还有多少?"]
tokenizer = AutoTokenizer.from_pretrained('DMetaSoul/sbert-chinese-qmc-finance-v1-distill')
model = AutoModel.from_pretrained('DMetaSoul/sbert-chinese-qmc-finance-v1-distill')
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
🔧 Technical Details
Evaluation
Here is a comparison with the corresponding teacher model before distillation:
Property |
Details |
Model Type |
The student model is a distilled version of the teacher model, from 12 - layer BERT (teacher) to 4 - layer BERT (student). |
Training Data |
Not mentioned in the original text. |
Performance:
|
Teacher |
Student |
Gap |
Model |
BERT-12-layers (102M) |
BERT-4-layers (45M) |
0.44x |
Cost |
23s |
12s |
-47% |
Latency |
38ms |
20ms |
-47% |
Throughput |
418 sentence/s |
791 sentence/s |
1.9x |
Accuracy:
|
csts_dev |
csts_test |
afqmc |
lcqmc |
bqcorpus |
pawsx |
xiaobu |
Avg |
Teacher |
77.40% |
74.55% |
36.00% |
75.75% |
73.24% |
11.58% |
54.75% |
57.61% |
Student |
75.02% |
71.99% |
32.40% |
67.06% |
66.35% |
7.57% |
49.26% |
52.80% |
Gap (abs.) |
- |
- |
- |
- |
- |
- |
- |
-4.81% |
Tested based on 10,000 data, GPU device is V100, batch_size = 16, max_seq_len = 256
📄 License
No license information provided in the original text.
Citing & Authors
E-mail: xiaowenbin@dmetasoul.com