Bert Tagalog Base Uncased WWM
A BERT variant trained on large-scale Tagalog text using whole-word masking, suitable for Filipino natural language processing tasks
Downloads 18
Release Time : 3/2/2022
Model Overview
This is a BERT model specifically trained for Tagalog (Filipino), employing whole-word masking during pre-training to advance Filipino NLP research and applications
Model Features
Whole-word masking technique
Uses whole-word masking instead of single-token masking, enhancing the model's understanding of complete semantic units
Optimized for low-resource languages
Specifically designed for Tagalog, a relatively resource-scarce language, filling the gap in Filipino pre-trained models
Research-oriented
Part of a larger research project aimed at advancing the Filipino NLP community
Model Capabilities
Text classification
Language understanding
Semantic analysis
Word vector generation
Use Cases
Academic research
Low-resource language model research
Used for studying model training and fine-tuning techniques in low-resource languages
Related findings have been published in arXiv papers
Commercial applications
Filipino text classification
Can be used for commercial applications like content classification and sentiment analysis in Filipino
Featured Recommended AI Models
Š 2025AIbase