V

Vietnamese Llama2 7b 40GB

Developed by bkai-foundation-models
A Vietnamese-optimized model based on Llama2-chat 7B, significantly improving Vietnamese language processing through incremental pre-training and an efficient tokenizer
Downloads 23
Release Time : 10/26/2023

Model Overview

This model is a Vietnamese-optimized variant of Llama2, significantly enhancing Vietnamese text encoding efficiency through retraining the tokenizer and continuous pre-training, suitable for Vietnamese natural language processing tasks

Model Features

Efficient Vietnamese Tokenization
Uses a SentencePiece-trained dedicated tokenizer, improving Vietnamese encoding efficiency by 70% compared to the original Llama2
Mixed Data Training
Uses a 40.5GB mixed dataset (Vietnamese news, Wikipedia, legal documents, and English data) for incremental pre-training
LoRA Adaptation
Employs Low-Rank Adaptation (LoRA) technology for efficient training, providing independent LoRA modules for easy integration

Model Capabilities

Vietnamese Text Generation
English Text Generation
Cross-language Understanding

Use Cases

Content Generation
Vietnamese News Generation
Trained on news corpora, capable of generating news content that conforms to Vietnamese language conventions
Legal Assistance
Legal Document Processing
Trained on extensive Vietnamese legal texts, suitable for legal document analysis and generation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase