D

Dek21 Hcmute Embedding

Developed by huyydangg
Vietnamese text embedding model focused on RAG and production efficiency, trained on a dataset of 100,000 legal questions
Downloads 696
Release Time : 1/25/2025

Model Overview

This model is a Vietnamese sentence transformer model specifically designed for similarity calculation and information retrieval of legal texts, trained using Russian doll loss for improved efficiency.

Model Features

Russian doll loss training
Allows truncation of embedding vectors with minimal performance loss, enabling faster comparison of smaller embedding vectors and improving production efficiency
Legal domain optimization
Trained on an internal dataset of approximately 100,000 legal questions and their related contexts, making it particularly suitable for legal text processing
Efficient vector comparison
Supports embedding vectors of multiple dimensions (768/512/256/128/64), allowing flexible selection based on performance requirements

Model Capabilities

Legal text similarity calculation
Legal information retrieval
Legal clause matching
Vietnamese text feature extraction

Use Cases

Legal information retrieval
Legal clause matching
Matching user queries with relevant legal clauses
Achieved a cosine accuracy@1 of 0.5856 on the test dataset
Legal Q&A system
Building a knowledge-based legal Q&A system
Achieved ndcg@3 of 0.9084 on the VMTEB-Zalo-legel-retrieval-wseg dataset
Legal document processing
Legal document classification
Automatic classification of legal documents
Legal document clustering
Automatic clustering of similar legal documents
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase