🚀 Turkish Named Entity Recognition (NER) Model
This Turkish Named Entity Recognition (NER) model is a fine - tuned version of "dbmdz/bert - base - turkish - cased". It is trained using a reviewed version of a well - known Turkish NER dataset (https://github.com/stefan - it/turkish - bert/files/4558187/nerdata.txt). This model can effectively identify named entities in Turkish text, providing accurate entity recognition services.
✨ Features
- Fine - tuned Model: Based on "dbmdz/bert - base - turkish - cased", it has better performance on Turkish NER tasks.
- High Accuracy: Achieves high accuracy, F1 - score, precision, and recall in reference test results.
📦 Installation
There is no specific installation command provided in the original document. However, to use this model, you need to have the necessary Python libraries installed, such as transformers
. You can install it using the following command:
pip install transformers
💻 Usage Examples
Basic Usage
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
ner("your text here")
Please refer to "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" for entity grouping with the aggregation_strategy
parameter.
🔧 Technical Details
Fine - tuning parameters
task = "ner"
model_checkpoint = "dbmdz/bert-base-turkish-cased"
batch_size = 8
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512
learning_rate = 2e-5
num_train_epochs = 3
weight_decay = 0.01
Reference test results
Metric |
Value |
Accuracy |
0.9933935699477056 |
F1 |
0.9592969472710453 |
Precision |
0.9543530277931161 |
Recall |
0.9642923563325274 |
Evaluation results with test sets
Test Set |
Accuracy |
Precision |
Recall |
F1 - Score |
20010000 |
0.9946 |
0.9871 |
0.9463 |
0.9662 |
20020000 |
0.9928 |
0.9134 |
0.9206 |
0.9170 |
20030000 |
0.9942 |
0.9814 |
0.9186 |
0.9489 |
20040000 |
0.9943 |
0.9660 |
0.9522 |
0.9590 |
20050000 |
0.9971 |
0.9539 |
0.9932 |
0.9732 |
20060000 |
0.9993 |
0.9942 |
0.9942 |
0.9942 |
20070000 |
0.9970 |
0.9806 |
0.9439 |
0.9619 |
20080000 |
0.9988 |
0.9821 |
0.9649 |
0.9735 |
20090000 |
0.9977 |
0.9891 |
0.9479 |
0.9681 |
20100000 |
0.9961 |
0.9684 |
0.9293 |
0.9485 |
Overall |
0.9961 |
0.9720 |
0.9516 |
0.9617 |
The evaluation results are based on the test sets proposed in "Küçük, D., Küçük, D., Arıcı, N. 2016. Türkçe Varlık İsmi Tanıma için bir Veri Kümesi ("A Named Entity Recognition Dataset for Turkish"). IEEE Sinyal İşleme, İletişim ve Uygulamaları Kurultayı. Zonguldak, Türkiye.".
📄 License
This project is licensed under the MIT license.