B

Bert Base Thai

Developed by monsoon-nlp
Thai-specific pre-trained model based on BERT-Base architecture, optimized for Thai tokenization characteristics, providing superior performance compared to multilingual BERT
Downloads 177
Release Time : 3/2/2022

Model Overview

This project provides a BERT model specifically optimized for Thai, addressing the issue of Thai being excluded from the original multilingual BERT due to tokenization difficulties. Customized preprocessing and tokenization methods significantly improve Thai text processing.

Model Features

Thai-specific tokenization
Uses BPEmb pre-trained SentencePiece model with 25,000 vocabulary, optimized for Thai's lack of explicit word separators
Performance advantage
Achieves 2.8% higher accuracy than multilingual BERT models on Thai XNLI tasks
Complete preprocessing pipeline
Provides a full preprocessing solution from raw Thai text to model input, including special sentence segmentation handling

Model Capabilities

Thai text representation
Cross-sentence relationship understanding
Downstream task fine-tuning

Use Cases

Text classification
Restaurant review classification
1-5 star rating classification for restaurant reviews on Wongnai platform
Achieved 0.56612 accuracy on public test set
Cross-language understanding
XNLI Thai task
Thai natural language inference task
Achieved 68.9% accuracy, outperforming multilingual BERT models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
ÂĐ 2025AIbase