bert-tagalog-base-uncased-WWM Open Source Model - Freely Available for Philippine Language Natural Language Processing

Bert Tagalog Base Uncased WWM

Developed by jcblaise

A BERT variant trained on large-scale Tagalog text using whole-word masking, suitable for Filipino natural language processing tasks

Large Language Model OtherOpen Source License:Gpl-3.0 #Low-resource language processing #Whole-word masking technique #Filipino NLP

Downloads 18

Release Time : 3/2/2022

Model Overview

This is a BERT model specifically trained for Tagalog (Filipino), employing whole-word masking during pre-training to advance Filipino NLP research and applications

Model Features

Whole-word masking technique

Uses whole-word masking instead of single-token masking, enhancing the model's understanding of complete semantic units

Optimized for low-resource languages

Specifically designed for Tagalog, a relatively resource-scarce language, filling the gap in Filipino pre-trained models

Research-oriented

Part of a larger research project aimed at advancing the Filipino NLP community

Model Capabilities

Text classification

Language understanding

Semantic analysis

Word vector generation

Use Cases

Academic research

Low-resource language model research

Used for studying model training and fine-tuning techniques in low-resource languages

Related findings have been published in arXiv papers

Commercial applications

Filipino text classification

Can be used for commercial applications like content classification and sentiment analysis in Filipino

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Bert Tagalog Base Uncased WWM

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 BERT Tagalog Base Uncased (Whole Word Masking)

🚀 Quick Start

📚 Documentation

Citations

Data and Other Resources

Contact

📄 License