V

Vietnamese Llama2 7b 120GB

Developed by bkai-foundation-models
A Vietnamese-optimized large language model based on Llama-2-7B, enhanced through continual pre-training on 124GB of multi-domain Vietnamese and English data for improved language understanding
Downloads 65
Release Time : 12/20/2023

Model Overview

This is a 7B-parameter large language model specifically optimized for Vietnamese, utilizing LoRA technology for continual pre-training on multi-domain Vietnamese data, significantly improving Vietnamese text processing efficiency

Model Features

Optimized Vietnamese tokenizer
SentencePiece tokenizer trained on extensive Vietnamese corpus, reducing token count by 50% compared to ChatGPT and approximately 70% compared to original Llama2
Multi-domain pre-training data
Integrated 124GB high-quality data (104GB Vietnamese + 20GB English) covering multiple domains including news, Wikipedia, books, legal documents
LoRA efficient fine-tuning
Employed LoRA technology for continual pre-training, effectively enhancing Vietnamese capabilities while maintaining core model parameters

Model Capabilities

Vietnamese text generation
English text generation
Cross-language understanding
Multi-domain text processing

Use Cases

Content generation
Vietnamese news writing
Trained on news corpus, can assist in news content generation
Legal document processing
Trained on legal documents, capable of handling specialized texts
Education
Vietnamese learning assistance
Can serve as a language model reference for Vietnamese learners
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase