Text2vec Base Multilingual
A multilingual text vectorization model focused on sentence similarity calculation and feature extraction tasks
Downloads 17
Release Time : 9/20/2023
Model Overview
This model is a multilingual text embedding model capable of converting texts in different languages into vector representations, primarily used for tasks such as sentence similarity calculation, text classification, and clustering. Supports multiple languages including Chinese, English, German, French, and more.
Model Features
Multilingual Support
Supports text vectorization in multiple languages including Chinese, English, German, French, and more
Sentence Similarity Calculation
Specifically optimized for calculating semantic similarity between different sentences
Cross-Lingual Capability
Capable of handling cross-lingual text similarity comparison tasks
Efficient Feature Extraction
Can quickly extract semantic feature representations of texts
Model Capabilities
Text vectorization
Sentence similarity calculation
Cross-lingual text comparison
Text feature extraction
Text classification
Text clustering
Use Cases
E-commerce
Multilingual Review Classification
Classifying product reviews in different languages on Amazon
In the MTEB Amazon multilingual review classification task, English accuracy was 33.13%, Chinese accuracy was 32.52%
Review Polarity Analysis
Analyzing the sentiment polarity (positive/negative) of product reviews
Achieved 66.10% accuracy in the MTEB Amazon polarity classification task
Intent Recognition
Multilingual Intent Classification
Identifying the intent category of user queries
In the MTEB MTOP intent classification task, English accuracy was 62.79%, German was 55.29%
Academic Research
Paper Clustering
Thematic clustering of academic papers
Achieved 32.32% V-measure in the MTEB arXiv paper clustering P2P task
Featured Recommended AI Models