bert-base-turkish-uncased-ner Open Source Model - Easily Achieve Turkish Named Entity Recognition

Home

Bert Base Turkish Uncased Ner

Developed by saribasmetehan

Turkish named entity recognition model fine-tuned based on dbmdz/bert-base-turkish-uncased

Sequence Labeling

Transformers

OtherOpen Source License:MIT #Turkish NER #Wikipedia Entity Recognition #Multi-category Entity Annotation

Downloads 54

Release Time : 6/6/2024

Model Overview

This model is a Turkish named entity recognition model fine-tuned on the turkish-wiki_ner dataset, supporting the identification of 18 entity types.

Model Features

Multi-category Entity Recognition

Supports identifying 18 different types of named entities, including people, organizations, locations, etc.

Turkish Language Optimization

Fine-tuned based on a Turkish pre-trained model, optimized for Turkish text.

Wikipedia Data Training

Trained using Turkish Wikipedia data, covering a wide range of subject areas.

Model Capabilities

Turkish Text Analysis

Named Entity Recognition

Entity Classification

Use Cases

Text Analysis

News Article Entity Extraction

Extract key information such as people, organizations, and locations from Turkish news articles.

F1 score reached 0.7821

Knowledge Graph Construction

Automatically identify and classify entities for Turkish knowledge graphs.

🚀 bert-base-turkish-uncased-ner

This model is a fine - tuned BERT model for Turkish NER, achieving high F1 scores on the turkish - wiki_ner dataset.

🚀 Quick Start

This model is a fine - tuned version of [dbmdz/bert - base - turkish - uncased](https://huggingface.co/dbmdz/bert - base - turkish - uncased) on the turkish - wiki_ner dataset. It achieves the following results on the evaluation set:

Loss: 0.2603
F1: 0.7821

✨ Features

Model Details

Property	Details
Model Type	Fine - tuned version of dbmdz/bert - base - turkish - uncased
Training Data	turkish - wiki_ner dataset (18,967 training samples, 1,000 validation samples from Wikipedia data)

Labels

CARDINAL
DATE
EVENT
FAC
GPE
LANGUAGE
LAW
LOC
MONEY
NORP
ORDINAL
ORG
PERCENT
PERSON
PRODUCT
QUANTITY
TIME
TITLE
WORK_OF_ART

Fine - Tuning Process

For more information about the fine - tuning process, please visit: [Fine - Tuning Process](https://github.com/saribasmetehan/bert - base - turkish - uncased - ner)

💻 Usage Examples

Basic Usage

from transformers import pipeline
import pandas as pd

text = "Bu toplam sıfır ise, Newton'ın birinci yasası cismin hareket durumunun değişmeyeceğini söyler."
model_id = "saribasmetehan/bert - base - turkish - uncased - ner"
ner = pipeline("ner", model = model_id)
preds = ner(text, aggregation_strategy = "simple")

pd.DataFrame(preds)

Advanced Usage

from transformers import AutoModelForTokenClassification, AutoTokenizer

model_name = "saribasmetehan/bert - base - turkish - uncased - ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e - 05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss	F1
0.4	1.0	1186	0.2502	0.7703
0.2227	2.0	2372	0.2439	0.7740
0.1738	3.0	3558	0.2511	0.7783
0.1474	4.0	4744	0.2603	0.7821

Framework versions

Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご