Indobert-lite-large-p2 Open-source Indonesian Language Model - Free Support for Indonesian Content Processing

Indobert Lite Large P2

Developed by indobenchmark

IndoBERT is an advanced language model based on BERT, specifically designed for Indonesian, trained with masked language modeling and next sentence prediction objectives.

Large Language Model

Transformers

OtherOpen Source License:MIT #Indonesian-specific #Lightweight BERT #Large-scale pretraining

Downloads 117

Release Time : 3/2/2022

Model Overview

IndoBERT is a pretrained language model designed for Indonesian, supporting natural language understanding tasks and suitable for processing Indonesian text.

Model Features

Optimized for Indonesian

The model is specifically trained and optimized for Indonesian, enabling better understanding and processing of Indonesian text.

Lightweight Design

The Lite version has fewer parameters, making it suitable for environments with limited resources.

Case Insensitive

The model is case insensitive and can handle text inputs in various case forms.

Model Capabilities

Indonesian text understanding

Masked language modeling

Next sentence prediction

Use Cases

Natural language processing

Text classification

Classifying Indonesian text

Named entity recognition

Identifying named entities in Indonesian text

🚀 IndoBERT-Lite Large Model (phase2 - uncased)

IndoBERT is a state-of-the-art language model for Indonesian based on the BERT model, trained using masked language modeling (MLM) and next sentence prediction (NSP) objectives.

✨ Features

IndoBERT stands as a cutting - edge language model tailored for the Indonesian language, built upon the BERT architecture. The pre - trained model undergoes training with masked language modeling (MLM) and next sentence prediction (NSP) objectives.

📚 Documentation

All Pre - trained Models

Property	Details
`indobenchmark/indobert-base-p1`	124.5M params, Base architecture, trained on Indo4B (23.43 GB of text)
`indobenchmark/indobert-base-p2`	124.5M params, Base architecture, trained on Indo4B (23.43 GB of text)
`indobenchmark/indobert-large-p1`	335.2M params, Large architecture, trained on Indo4B (23.43 GB of text)
`indobenchmark/indobert-large-p2`	335.2M params, Large architecture, trained on Indo4B (23.43 GB of text)
`indobenchmark/indobert-lite-base-p1`	11.7M params, Base architecture, trained on Indo4B (23.43 GB of text)
`indobenchmark/indobert-lite-base-p2`	11.7M params, Base architecture, trained on Indo4B (23.43 GB of text)
`indobenchmark/indobert-lite-large-p1`	17.7M params, Large architecture, trained on Indo4B (23.43 GB of text)
`indobenchmark/indobert-lite-large-p2`	17.7M params, Large architecture, trained on Indo4B (23.43 GB of text)

💻 Usage Examples

Basic Usage

from transformers import BertTokenizer, AutoModel
tokenizer = BertTokenizer.from_pretrained("indobenchmark/indobert-lite-large-p2")
model = AutoModel.from_pretrained("indobenchmark/indobert-lite-large-p2")

Advanced Usage

x = torch.LongTensor(tokenizer.encode('aku adalah anak [MASK]')).view(1,-1)
print(x, model(x)[0].sum())

📄 License

This project is licensed under the MIT license.

👥 Authors

IndoBERT was trained and evaluated by Bryan Wilie*, Karissa Vincentio*, Genta Indra Winata*, Samuel Cahyawijaya*, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, Ayu Purwarianti.

📚 Citation

If you use our work, please cite:

@inproceedings{wilie2020indonlu,
  title={IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding},
  author={Bryan Wilie and Karissa Vincentio and Genta Indra Winata and Samuel Cahyawijaya and X. Li and Zhi Yuan Lim and S. Soleman and R. Mahendra and Pascale Fung and Syafri Bahar and A. Purwarianti},
  booktitle={Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing},
  year={2020}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご