sbert-chinese-qmc-finance-v1-distill open-source model - Financial question matching, fast and accurate inference!

Sbert Chinese Qmc Finance V1 Distill

Developed by DMetaSoul

A lightweight sentence similarity model optimized for financial domain question matching, compressing 12-layer BERT to 4 layers through distillation technology, significantly improving inference efficiency

Text Embedding

Transformers

#Financial Question Matching #Lightweight BERT #Low-latency Inference

Downloads 20

Release Time : 4/2/2022

Model Overview

This model is a lightweight version for financial domain question matching scenarios, suitable for calculating sentence similarity and semantic search tasks, with special optimization for the matching accuracy of financial-related questions

Model Features

Lightweight Design

Compressed the original 12-layer BERT to 4 layers through distillation technology, reducing parameters by 56% and improving inference speed by nearly double

Financial Domain Optimization

Specifically optimized for financial question matching scenarios, effectively handling professional domain semantics such as interest calculation and repayment issues

Efficient Inference

Compared to the original model, latency is reduced by 47% and throughput is increased by 90%, making it suitable for production environment deployment

Model Capabilities

Sentence Similarity Calculation

Semantic Feature Extraction

Financial Question Matching

Semantic Search

Use Cases

Financial Customer Service

Question Matching

Identify the semantic similarity between user questions and knowledge base questions

Accurately matches professional questions like '8,000 yuan daily interest is 400 yuan?' with 'How much is the daily interest for 10,000 yuan?'

Intelligent Q&A

Provides semantic understanding capabilities for financial customer service systems

Understands the semantic equivalence between 'Why did my loan transaction fail?' and 'Reasons for loan application failure'

Financial Knowledge Management

Document Retrieval

Financial document search based on semantic similarity

🚀 DMetaSoul/sbert-chinese-qmc-finance-v1-distill

This model is a distilled and lightweight version (only 4-layer BERT) of the previously open-source financial question matching model. It is suitable for the question matching scenario in the financial field, such as:

8 thousand yuan with 400 yuan in interest for 1000 days? VS How much is the daily interest for 10,000 yuan?
Early repayment is calculated based on the full amount VS How to make a repayment when the payment fails?
Why did my borrowing transaction fail? VS Why did the newly applied loan fail?

If a pre-trained large model is directly used for online inference, it has strict requirements on computing resources and is difficult to meet the performance indicators such as latency and throughput in the business environment. Here, we use the distillation method to lightweight the large model. After distilling from a 12-layer BERT to a 4-layer BERT, the number of model parameters is reduced to 44%, the latency is approximately halved, the throughput is doubled, and the accuracy drops by about 5% (for specific results, see the evaluation section below).

🚀 Quick Start

✨ Features

This model is a distilled and lightweight version of the previous open - source financial question matching model, suitable for financial question matching scenarios. It reduces the model size and improves performance indicators such as latency and throughput through distillation, with a slight sacrifice in accuracy.

📦 Installation

1. Sentence - Transformers

Install the necessary library through the sentence-transformers framework:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

1. Sentence - Transformers

Use the following code to load the model and extract text representation vectors:

from sentence_transformers import SentenceTransformer
sentences = ["到期不能按时还款怎么办", "剩余欠款还有多少？"]

model = SentenceTransformer('DMetaSoul/sbert-chinese-qmc-finance-v1-distill')
embeddings = model.encode(sentences)
print(embeddings)

2. HuggingFace Transformers

If you don't want to use sentence-transformers, you can also load the model and extract text vectors through HuggingFace Transformers:

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ["到期不能按时还款怎么办", "剩余欠款还有多少？"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('DMetaSoul/sbert-chinese-qmc-finance-v1-distill')
model = AutoModel.from_pretrained('DMetaSoul/sbert-chinese-qmc-finance-v1-distill')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

🔧 Technical Details

Evaluation

Here is a comparison with the corresponding teacher model before distillation:

Property	Details
Model Type	The student model is a distilled version of the teacher model, from 12 - layer BERT (teacher) to 4 - layer BERT (student).
Training Data	Not mentioned in the original text.

Performance:

	Teacher	Student	Gap
Model	BERT-12-layers (102M)	BERT-4-layers (45M)	0.44x
Cost	23s	12s	-47%
Latency	38ms	20ms	-47%
Throughput	418 sentence/s	791 sentence/s	1.9x

Accuracy:

	csts_dev	csts_test	afqmc	lcqmc	bqcorpus	pawsx	xiaobu	Avg
Teacher	77.40%	74.55%	36.00%	75.75%	73.24%	11.58%	54.75%	57.61%
Student	75.02%	71.99%	32.40%	67.06%	66.35%	7.57%	49.26%	52.80%
Gap (abs.)	-	-	-	-	-	-	-	-4.81%

Tested based on 10,000 data, GPU device is V100, batch_size = 16, max_seq_len = 256

📄 License

No license information provided in the original text.

Citing & Authors

E-mail: xiaowenbin@dmetasoul.com

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご