V

Vietnamese Embedding

Developed by dangvantuan
An embedding model specifically designed for Vietnamese, optimized based on PhoBERT, capable of encoding Vietnamese sentences into a 768-dimensional vector space, suitable for various scenarios such as semantic search and text clustering.
Downloads 6,063
Release Time : 4/20/2024

Model Overview

This model is optimized based on PhoBERT (a pre-trained language model using the RoBERTa architecture), accurately capturing Vietnamese vocabulary and contextual semantic levels to generate high-quality sentence embedding vectors.

Model Features

Vietnamese Optimization
Specifically designed and optimized for Vietnamese, better handling Vietnamese vocabulary and grammatical structures.
Multi-stage Training
Gradually optimized through a four-stage training process, including initial training, continuous fine-tuning, STS benchmark fine-tuning, and advanced data augmentation fine-tuning.
High Performance
Outstanding performance on Vietnamese STS datasets, with both Pearson and Spearman correlation coefficients exceeding 88%.
Strong Semantic Capture
Accurately captures the semantic levels and contextual relationships of Vietnamese sentences.

Model Capabilities

Sentence Embedding
Semantic Search
Text Clustering
Sentence Similarity Calculation

Use Cases

Natural Language Processing
Semantic Search
Used to build Vietnamese semantic search engines, improving the relevance of search results.
More accurately matches the semantics of queries and documents
Text Clustering
Performs clustering analysis on Vietnamese texts to discover themes and patterns.
Generates high-quality text clustering results
Sentence Similarity Calculation
Calculates the semantic similarity between two Vietnamese sentences.
Pearson correlation coefficient reaches 88.33%
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase