B

Bert Base Indonesian 1.5G

Developed by cahya
This is a BERT-based Indonesian pretrained model trained on Wikipedia and newspaper data, suitable for various natural language processing tasks.
Downloads 40.08k
Release Time : 3/2/2022

Model Overview

This model is an Indonesian pretrained model based on the BERT architecture, trained with masked language modeling objectives, supporting Indonesian text processing tasks.

Model Features

Case Insensitive
The model is case-insensitive, suitable for processing Indonesian texts in different case forms.
Large-scale Pretraining Data
Pretrained using 522MB of Indonesian Wikipedia and 1GB of 2018 Indonesian newspaper data.
WordPiece Tokenization
Uses a WordPiece tokenizer with a 32,000-word vocabulary for text processing.

Model Capabilities

Text Feature Extraction
Masked Language Modeling
Indonesian Text Processing

Use Cases

Natural Language Processing
Text Infilling
Uses masked language modeling to predict missing words in sentences.
Example shows the model accurately predicts 'di' in 'ibu ku sedang bekerja di supermarket'.
Text Feature Extraction
Obtain vector representations of Indonesian texts for downstream tasks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase