B

Bert Base Indonesian 522M

Developed by cahya
A BERT base model pretrained on Indonesian Wikipedia using Masked Language Modeling (MLM) objective, case insensitive.
Downloads 2,799
Release Time : 3/2/2022

Model Overview

This model is a BERT base model pretrained on Indonesian Wikipedia, primarily used for natural language processing tasks such as text classification and text generation.

Model Features

Case Insensitive
The model is case insensitive, e.g., 'indonesia' and 'Indonesia' are treated as the same.
Based on Indonesian Wikipedia
Pretrained on 522MB of Indonesian Wikipedia data, suitable for Indonesian natural language processing tasks.
WordPiece Tokenization
Uses WordPiece tokenization with a vocabulary size of 32,000.

Model Capabilities

Masked Language Modeling
Text Classification
Text Generation
Feature Extraction

Use Cases

Natural Language Processing
Fill Mask
Use the model to predict masked words in a sentence.
As shown in the example, the model accurately predicts the masked word 'di' in 'Ibu ku sedang bekerja [MASK] supermarket'.
Text Feature Extraction
Use the model to extract text feature representations for downstream tasks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase