B

Bert Tagalog Base Uncased WWM

Developed by jcblaise
A BERT variant trained on large-scale Tagalog text using whole-word masking, suitable for Filipino natural language processing tasks
Downloads 18
Release Time : 3/2/2022

Model Overview

This is a BERT model specifically trained for Tagalog (Filipino), employing whole-word masking during pre-training to advance Filipino NLP research and applications

Model Features

Whole-word masking technique
Uses whole-word masking instead of single-token masking, enhancing the model's understanding of complete semantic units
Optimized for low-resource languages
Specifically designed for Tagalog, a relatively resource-scarce language, filling the gap in Filipino pre-trained models
Research-oriented
Part of a larger research project aimed at advancing the Filipino NLP community

Model Capabilities

Text classification
Language understanding
Semantic analysis
Word vector generation

Use Cases

Academic research
Low-resource language model research
Used for studying model training and fine-tuning techniques in low-resource languages
Related findings have been published in arXiv papers
Commercial applications
Filipino text classification
Can be used for commercial applications like content classification and sentiment analysis in Filipino
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase