H

Halong Embedding

Developed by hiieu
Vietnamese text embedding model focused on RAG (Retrieval-Augmented Generation) and productivity, fine-tuned based on intfloat/multilingual-e5-base
Downloads 7,651
Release Time : 7/6/2024

Model Overview

Halong Embedding is a sentence-transformers model fine-tuned from intfloat/multilingual-e5-base, specializing in Vietnamese text embedding, supporting tasks such as semantic text similarity, semantic search, paraphrase mining, text classification, and clustering.

Model Features

Matryoshka embedding
Trained with Matryoshka loss function, allowing truncated embedding vectors with minimal performance loss, providing faster comparison speeds.
Multilingual support
Primarily Vietnamese, while also supporting multilingual processing.
Efficient retrieval
Focused on RAG (Retrieval-Augmented Generation) and productivity, optimizing information retrieval performance.

Model Capabilities

Semantic text similarity calculation
Semantic search
Paraphrase mining
Text classification
Cluster analysis

Use Cases

Information retrieval
Legal document retrieval
Evaluated model performance on the Zalo legal retrieval dataset for quickly finding relevant legal documents.
Accuracy@1 reached 0.8294, Accuracy@10 reached 0.9687
Health domain Q&A
Health benefits query
Retrieving football-related information about health benefits.
Relevant documents ranked by cosine similarity, highest similarity 0.7318
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase