T

Text2vec Base Chinese Paraphrase

Developed by shibing624
A Chinese text vectorization model trained based on the CoSENT method, supporting tasks such as sentence embedding, text matching, and semantic search.
Downloads 45.88k
Release Time : 6/19/2023

Model Overview

This model maps Chinese sentences to a 768-dimensional dense vector space and can be used for tasks such as sentence embedding, text matching, or semantic search. Based on the nghuyong/ernie-3.0-base-zh model, it is trained using an enhanced Chinese STS dataset and achieves SOTA on various Chinese NLI test sets.

Model Features

Trained on an enhanced Chinese STS dataset
Trained using an enhanced Chinese STS dataset containing s2p (sentence-to-paragraph) data, which strengthens the long-text representation ability.
SOTA performance
Achieves the current optimal performance on various Chinese NLI test sets, with an average Spearman's correlation coefficient of 63.08.
Efficient inference
Supports an inference speed of 3066 QPS, suitable for deployment in production environments.

Model Capabilities

Text vectorization
Sentence similarity calculation
Semantic search
Text matching
Feature extraction

Use Cases

Information retrieval
Semantic search
Convert queries and documents into vectors and then calculate the similarity to achieve search based on semantics rather than keywords.
Improve the relevance of search results.
Intelligent customer service
Question matching
Calculate the similarity between user questions and knowledge base questions to achieve automatic question answering.
Improve the accuracy of the customer service system.
Text clustering
Document categorization
Cluster similar documents through vector distances.
Achieve unsupervised document classification.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase