V

Vietnamese Document Embedding

Developed by dangvantuan
A document embedding model for Vietnamese, supporting contexts up to 8096 tokens, trained based on gte-multilingual
Downloads 77.61k
Release Time : 8/15/2024

Model Overview

This is a long-text embedding model specifically trained for Vietnamese, capable of generating precise and contextually relevant sentence embeddings, suitable for tasks such as semantic similarity calculation and document retrieval in Vietnamese text.

Model Features

Long Text Support
Supports contexts up to 8096 tokens, suitable for processing long Vietnamese documents
Multi-Stage Training
Trained in two stages with XNLI natural language inference and STS semantic similarity to enhance model performance
Advanced Loss Functions
Utilizes multiple negative ranking loss, Matryoshka2dLoss, and similarity loss for training

Model Capabilities

Vietnamese text embedding
Sentence similarity calculation
Document retrieval
Semantic feature extraction

Use Cases

Text Retrieval
Vietnamese Document Retrieval
Use this model to generate embeddings for Vietnamese documents, enabling an efficient document retrieval system
Semantic Analysis
Vietnamese Sentence Similarity Calculation
Calculate semantic similarity between Vietnamese sentences for use in QA systems or chatbots
Achieved an average Spearman score of 82.45 on the STS Benchmark
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase