bert-base-turkish-cased-ner Open Source Model - Free Deployment to Identify Turkish Named Entities such as Person Names and Organization Names

Bert Base Turkish Cased Ner

Developed by akdeniz27

A Turkish named entity recognition model fine-tuned based on the dbmdz/bert-base-turkish-cased model, capable of recognizing entities such as person names, organization names, and location names.

Sequence Labeling

Transformers

OtherOpen Source License:MIT #Turkish NER #High-precision entity recognition #BERT fine-tuning

Downloads 115.25k

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Turkish named entity recognition tasks, capable of identifying entities such as person names (PER), organization names (ORG), and location names (LOC) in text.

Model Features

High-precision Turkish NER

Achieves an F1 score above 0.96 on multiple test sets, with accuracy exceeding 0.99

BERT-based architecture

Uses dbmdz/bert-base-turkish-cased as the base model, optimized for Turkish

Entity grouping functionality

Supports intelligent grouping of recognized entities via the aggregation_strategy parameter

Model Capabilities

Turkish text analysis

Named entity recognition

Person name recognition

Organization name recognition

Location name recognition

Use Cases

Text analysis

News entity extraction

Extract person names, organizations, and location information from Turkish news

F1 score reaches above 0.96

Social media analysis

Analyze mentioned entities in Turkish social media content

🚀 Turkish Named Entity Recognition (NER) Model

This Turkish Named Entity Recognition (NER) model is a fine - tuned version of "dbmdz/bert - base - turkish - cased". It is trained using a reviewed version of a well - known Turkish NER dataset (https://github.com/stefan - it/turkish - bert/files/4558187/nerdata.txt). This model can effectively identify named entities in Turkish text, providing accurate entity recognition services.

✨ Features

Fine - tuned Model: Based on "dbmdz/bert - base - turkish - cased", it has better performance on Turkish NER tasks.
High Accuracy: Achieves high accuracy, F1 - score, precision, and recall in reference test results.

📦 Installation

There is no specific installation command provided in the original document. However, to use this model, you need to have the necessary Python libraries installed, such as transformers. You can install it using the following command:

pip install transformers

💻 Usage Examples

Basic Usage

from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline

model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
ner("your text here")

Please refer to "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" for entity grouping with the aggregation_strategy parameter.

🔧 Technical Details

Fine - tuning parameters

task = "ner"
model_checkpoint = "dbmdz/bert-base-turkish-cased"
batch_size = 8 
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512 
learning_rate = 2e-5 
num_train_epochs = 3 
weight_decay = 0.01

Reference test results

Metric	Value
Accuracy	0.9933935699477056
F1	0.9592969472710453
Precision	0.9543530277931161
Recall	0.9642923563325274

Evaluation results with test sets

Test Set	Accuracy	Precision	Recall	F1 - Score
20010000	0.9946	0.9871	0.9463	0.9662
20020000	0.9928	0.9134	0.9206	0.9170
20030000	0.9942	0.9814	0.9186	0.9489
20040000	0.9943	0.9660	0.9522	0.9590
20050000	0.9971	0.9539	0.9932	0.9732
20060000	0.9993	0.9942	0.9942	0.9942
20070000	0.9970	0.9806	0.9439	0.9619
20080000	0.9988	0.9821	0.9649	0.9735
20090000	0.9977	0.9891	0.9479	0.9681
20100000	0.9961	0.9684	0.9293	0.9485
Overall	0.9961	0.9720	0.9516	0.9617

The evaluation results are based on the test sets proposed in "Küçük, D., Küçük, D., Arıcı, N. 2016. Türkçe Varlık İsmi Tanıma için bir Veri Kümesi ("A Named Entity Recognition Dataset for Turkish"). IEEE Sinyal İşleme, İletişim ve Uygulamaları Kurultayı. Zonguldak, Türkiye.".

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご