bert-base-turkish-ner-cased Open-source Model - For Turkish Text Entity Recognition Tasks

Bert Base Turkish Ner Cased

Developed by savasy

This is a BERT-based Turkish named entity recognition model, suitable for entity recognition tasks in Turkish texts.

Sequence Labeling Other#Turkish Named Entity Recognition #High-precision NER #Transfer Learning Optimization

Downloads 1,269

Release Time : 3/2/2022

Model Overview

The model utilizes the BERT architecture and transfer learning techniques, optimized for Turkish, capable of identifying entities such as person names, locations, and time in texts.

Model Features

High-precision Entity Recognition

Performs excellently on Turkish texts, achieving an F1 score of 0.925 or higher.

BERT-based Architecture

Leverages pre-trained BERT models for fine-tuning, achieving better language understanding.

Transfer Learning

Achieves high performance on limited datasets through transfer learning techniques.

Model Capabilities

Recognize named entities in Turkish texts

Handle specific linguistic features of Turkish

Support recognition of multiple entity types

Use Cases

Text Analysis

Historical Text Analysis

Analyze person, location, and time information in Turkish historical texts

Can accurately recognize names like 'Mustafa Kemal Atatürk' and dates like '19 Mayıs 1919'

News Content Analysis

Extract key entity information from Turkish news articles

🚀 Easy-to-use NER Application for Turkish Language

This is an easy-to-use NER (Named Entity Recognition) application for the Turkish language. It's a Python NER model (Bert + Transfer Learning) for Turkish.

📚 Citation

Please cite the following works if you use this model in your study:

@misc{yildirim2024finetuning,
      title={Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks}, 
      author={Savas Yildirim},
      year={2024},
      eprint={2401.17396},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@book{yildirim2021mastering,
  title={Mastering Transformers: Build state-of-the-art models from scratch with advanced natural language processing techniques},
  author={Yildirim, Savas and Asgari-Chenaghlu, Meysam},
  year={2021},
  publisher={Packt Publishing Ltd}
}

📦 Installation

Thanks to @stefan-it, the following steps can be used for training:

cd tr-data
for file in train.txt dev.txt test.txt labels.txt
do
  wget https://schweter.eu/storage/turkish-bert-wikiann/$file
done
cd ..

This will download the pre-processed datasets with training, development, and test splits and put them in a tr-data folder.

🔧 Pre-training

After downloading the dataset, pre-training can be started. Just set the following environment variables:

export MAX_LENGTH=128
export BERT_MODEL=dbmdz/bert-base-turkish-cased 
export OUTPUT_DIR=tr-new-model
export BATCH_SIZE=32
export NUM_EPOCHS=3
export SAVE_STEPS=625
export SEED=1

Then run pre-training:

python3 run_ner_old.py --data_dir ./tr-data3 \
--model_type bert \
--labels ./tr-data/labels.txt \
--model_name_or_path $BERT_MODEL \
--output_dir $OUTPUT_DIR-$SEED \
--max_seq_length $MAX_LENGTH \
--num_train_epochs $NUM_EPOCHS \
--per_gpu_train_batch_size $BATCH_SIZE \
--save_steps $SAVE_STEPS \
--seed $SEED \
--do_train \
--do_eval \
--do_predict \
--fp16

💻 Usage Examples

Basic Usage

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
model = AutoModelForTokenClassification.from_pretrained("savasy/bert-base-turkish-ner-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-ner-cased")
ner = pipeline('ner', model=model, tokenizer=tokenizer)
ner("Mustafa Kemal Atatürk 19 Mayıs 1919'da Samsun'a ayak bastı.")

📊 Some Results

Data1: For the data above

Eval Results:

precision = 0.916400580551524
recall = 0.9342309684101502
f1 = 0.9252298787412536
loss = 0.11335893666411284

Test Results:

precision = 0.9192058759362955
recall = 0.9303010230367262
f1 = 0.9247201697271198
loss = 0.11182546521618497

Data2:

https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt The performance for the data given by @kemalaraz is as follows:

savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat eval_results.txt
* precision = 0.9461980692049029
* recall = 0.959309358847465
* f1 = 0.9527086063783312
* loss = 0.037054269206847804

savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat test_results.txt
* precision = 0.9458370635631155
* recall = 0.9588201928530913
* f1 = 0.952284378344882
* loss = 0.035431676572445225

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご