NusaBert-ner-v1.3 Open-Source Named Entity Recognition Model - Accurately Identify Various Entities in Indonesian

Home

Nusabert Ner V1.3

Developed by cahya

Named entity recognition model fine-tuned on Indonesian NER tasks based on NusaBert-v1.3

Sequence Labeling

Transformers

OtherOpen Source License:MIT #Indonesian NER #Long-text support #Multi-category NER

Downloads 759.09k

Release Time : 3/6/2025

Model Overview

This BERT model is optimized for Indonesian named entity recognition tasks, supporting 8192 context length and capable of identifying 19 entity types

Model Features

Long-context support

Supports processing of long contexts up to 8192 tokens

Multi-entity recognition

Capable of recognizing 19 different types of named entities

Indonesian language optimization

Pre-trained model specifically optimized for Indonesian text

Model Capabilities

Indonesian text processing

Named entity recognition

Long-text analysis

Use Cases

Text information extraction

News entity extraction

Extract entities like people, organizations, and locations from Indonesian news

F1 score 0.817

Legal document analysis

Identify legal entities and organizations in legal texts

🚀 NusaBert-ner-v1.3

This project presents NusaBert-ner-v1.3, a fine - tuned model based on cahya/NusaBert-v1.3. It is specifically trained on the grit - id/id_nergrit_corpus ner dataset, offering high - performance token classification capabilities with a context length of 8192.

🚀 Quick Start

from transformers import pipeline
ner = pipeline("ner", model="cahya/NusaBert-ner-v1.3", grouped_entities=True)
text = "Jakarta, April 2025 - Polisi mengungkap sosok teman pemberi uang palsu kepada artis Sekar Arum Widara. Sosok tersebut ternyata adalah Bayu Setio Aribowo (BS), pegawai nonaktif Garuda yang ditangkap Polsek Tanah Abang di kasus serupa."
result = ner(text)
print(result)

✨ Features

Fine - Tuned Model: Based on cahya/NusaBert-v1.3, fine - tuned on the grit - id/id_nergrit_corpus ner dataset.
Long Context Support: Supports a context length of 8192, consistent with the pre - trained model.
High Performance: Achieves excellent results in precision, recall, F1 score, and accuracy on the evaluation set.

📚 Documentation

Model description

The dataset contains the following 19 entities:

    'CRD': Cardinal
    'DAT': Date
    'EVT': Event
    'FAC': Facility
    'GPE': Geopolitical Entity
    'LAW': Law Entity (such as Undang - Undang)
    'LOC': Location
    'MON': Money
    'NOR': Political Organization
    'ORD': Ordinal
    'ORG': Organization
    'PER': Person
    'PRC': Percent
    'PRD': Product
    'QTY': Quantity
    'REG': Religion
    'TIM': Time
    'WOA': Work of Art
    'LAN': Language

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 32
eval_batch_size: 32
seed: 42
distributed_type: multi - GPU
num_devices: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3.0

Training results

More information needed

Framework versions

Transformers 4.49.0
Pytorch 2.5.1+cu124
Datasets 2.19.2
Tokenizers 0.21.0

📄 License

This model is licensed under the MIT license.

📦 Model Information

Property	Details
Library Name	transformers
Base Model	cahya/NusaBert-v1.3
Tags	generated_from_trainer
Datasets	grit-id/id_nergrit_corpus
Metrics	precision, recall, f1, accuracy
Pipeline Tag	token - classification
Model Index Name	nusabert_nergrit_1.3
Evaluation Results (Validation Set)	Precision: 0.8010483135824977 Recall: 0.8338275412169375 F1: 0.8171093159760562 Accuracy: 0.9476653696498054

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご