đ NusaBert-ner-v1.3
This project presents NusaBert-ner-v1.3, a fine - tuned model based on cahya/NusaBert-v1.3. It is specifically trained on the grit - id/id_nergrit_corpus ner dataset, offering high - performance token classification capabilities with a context length of 8192.
đ Quick Start
from transformers import pipeline
ner = pipeline("ner", model="cahya/NusaBert-ner-v1.3", grouped_entities=True)
text = "Jakarta, April 2025 - Polisi mengungkap sosok teman pemberi uang palsu kepada artis Sekar Arum Widara. Sosok tersebut ternyata adalah Bayu Setio Aribowo (BS), pegawai nonaktif Garuda yang ditangkap Polsek Tanah Abang di kasus serupa."
result = ner(text)
print(result)
⨠Features
- Fine - Tuned Model: Based on cahya/NusaBert-v1.3, fine - tuned on the grit - id/id_nergrit_corpus ner dataset.
- Long Context Support: Supports a context length of 8192, consistent with the pre - trained model.
- High Performance: Achieves excellent results in precision, recall, F1 score, and accuracy on the evaluation set.
đ Documentation
Model description
The dataset contains the following 19 entities:
'CRD': Cardinal
'DAT': Date
'EVT': Event
'FAC': Facility
'GPE': Geopolitical Entity
'LAW': Law Entity (such as Undang - Undang)
'LOC': Location
'MON': Money
'NOR': Political Organization
'ORD': Ordinal
'ORG': Organization
'PER': Person
'PRC': Percent
'PRD': Product
'QTY': Quantity
'REG': Religion
'TIM': Time
'WOA': Work of Art
'LAN': Language
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e - 05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- distributed_type: multi - GPU
- num_devices: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3.0
Training results
More information needed
Framework versions
- Transformers 4.49.0
- Pytorch 2.5.1+cu124
- Datasets 2.19.2
- Tokenizers 0.21.0
đ License
This model is licensed under the MIT license.
đĻ Model Information
Property |
Details |
Library Name |
transformers |
Base Model |
cahya/NusaBert-v1.3 |
Tags |
generated_from_trainer |
Datasets |
grit-id/id_nergrit_corpus |
Metrics |
precision, recall, f1, accuracy |
Pipeline Tag |
token - classification |
Model Index Name |
nusabert_nergrit_1.3 |
Evaluation Results (Validation Set) |
Precision: 0.8010483135824977 Recall: 0.8338275412169375 F1: 0.8171093159760562 Accuracy: 0.9476653696498054 |