ner-vietnamese-electra-base Open Source Model - Freely Identify Named Entities in Vietnamese

Home

Ner Vietnamese Electra Base

Developed by NlpHUST

Vietnamese named entity recognition model based on ELECTRA architecture, fine-tuned on VLSP 2018 dataset

Sequence Labeling

Transformers

Other#Vietnamese NER #High-precision entity recognition #Police information processing

Downloads 41.60k

Release Time : 10/28/2022

Model Overview

This model is specifically designed for named entity recognition tasks in Vietnamese text, capable of identifying entities such as persons, locations, and organizations.

Model Features

High-precision entity recognition

Achieves 96.64% F1-score in person recognition and 93.65% F1-score in location recognition.

Based on ELECTRA architecture

Uses the efficient ELECTRA pre-trained model as a foundation, providing better semantic understanding capabilities.

Multi-category entity recognition

Capable of recognizing various entity types such as persons, locations, and organizations.

Model Capabilities

Vietnamese text processing

Named entity recognition

Entity classification

Use Cases

News analysis

News entity extraction

Extract key persons, locations, and organizations from Vietnamese news

Can be used for news categorization and event analysis

Social media monitoring

Social media content analysis

Identify key entities in social media text

Helps monitor trending topics and key figures

🚀 Vietnamese Named Entity Recognition (NER) Model

This is a Vietnamese Named Entity Recognition (NER) model. It is a fine - tuned version of [NlpHUST/electra - base - vn](https://huggingface.co/NlpHUST/electra - base - vn), which can effectively identify named entities in Vietnamese text.

🚀 Quick Start

This model is a fine - tuned version of [NlpHUST/electra - base - vn](https://huggingface.co/NlpHUST/electra - base - vn) on an VLSP 2018 dataset. It achieves the following results on the evaluation set:

Loss: 0.0580
Location Precision: 0.9353
Location Recall: 0.9377
Location F1: 0.9365
Location Number: 2360
Miscellaneous Precision: 0.5660
Miscellaneous Recall: 0.6897
Miscellaneous F1: 0.6218
Miscellaneous Number: 174
Organization Precision: 0.8610
Organization Recall: 0.9068
Organization F1: 0.8833
Organization Number: 1878
Person Precision: 0.9692
Person Recall: 0.9637
Person F1: 0.9664
Person Number: 2121
Overall Precision: 0.9122
Overall Recall: 0.9307
Overall F1: 0.9214
Overall Accuracy: 0.9907

✨ Features

Fine - tuned Model: Based on [NlpHUST/electra - base - vn](https://huggingface.co/NlpHUST/electra - base - vn), it is fine - tuned on the VLSP 2018 dataset.
Multiple Entity Types: Can identify various named entities such as locations, organizations, persons, and miscellaneous entities.
High Performance: Achieves high precision, recall, and F1 scores on the evaluation set.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("NlpHUST/ner - vietnamese - electra - base")
model = AutoModelForTokenClassification.from_pretrained("NlpHUST/ner - vietnamese - electra - base")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Liên quan vụ việc CSGT bị tố đánh dân, trúng một cháu nhỏ đang ngủ, đang lan truyền trên mạng xã hội, Đại tá Nguyễn Văn Tảo, Phó Giám đốc Công an tỉnh Tiền Giang vừa có cuộc họp cùng Chỉ huy Công an huyện Châu Thành và một số đơn vị nghiệp vụ cấp tỉnh để chỉ đạo làm rõ thông tin."

ner_results = nlp(example)
print(ner_results)

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 16
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 10.0

Framework versions

Transformers 4.20.1
Pytorch 1.8.0+cu111
Datasets 2.4.0
Tokenizers 0.12.1

Contact information

For personal communication related to this project, please contact Nha Nguyen Van (nha282@gmail.com).

📄 License

No license information provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご