I

Indobertweet Base Uncased

Developed by indolem
The first pre-trained language model specifically for Indonesian Twitter, built by extending Indonesian BERT with domain-specific vocabulary
Downloads 2,848
Release Time : 3/2/2022

Model Overview

IndoBERTweet is a pre-trained model optimized for Indonesian Twitter, employing effective domain-specific vocabulary initialization methods, excelling in various Indonesian Twitter NLP tasks

Model Features

Domain-Specific Vocabulary Initialization
Initializes Twitter domain vocabulary via average pooling of BERT subword embeddings, more efficient than training from scratch
Large-Scale Pretraining Data
Uses 409 million tokens of Indonesian tweet data, twice the training data of IndoBERT
Twitter Text Optimization
Specifically handles Twitter-specific content like user mentions, URLs, and emojis

Model Capabilities

Indonesian Twitter text understanding
Sentiment Analysis
Emotion Recognition
Hate Speech Detection
Named Entity Recognition

Use Cases

Social Media Analysis
Twitter Sentiment Analysis
Analyzes sentiment tendencies of Indonesian Twitter users on specific topics
Achieves 86.6% accuracy on the IndoLEM dataset
Hate Speech Detection
Identifies hate speech content in Indonesian Twitter
Achieves 88.8% accuracy on the HS1 dataset
Natural Language Processing
Named Entity Recognition
Identifies entities like person names and locations in Indonesian Twitter text
Achieves 88.1% accuracy on formal text datasets
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase