Distilbert-fa-zwnj-base-ner Open Source Model - Free Support for Named Entity Recognition of 10 Persian Entity Types

Distilbert Fa Zwnj Base Ner

Developed by HooshvareLab

A DistilBERT model fine-tuned for Persian Named Entity Recognition (NER) tasks, supporting recognition of 10 entity types.

Sequence Labeling

Transformers

Other#Persian Named Entity Recognition #Multi-dataset Training #10-class Entity Annotation

Downloads 101

Release Time : 3/2/2022

Model Overview

This model is based on the DistilBERT architecture, specifically designed for named entity recognition tasks in Persian text, capable of identifying 10 types of entities including dates, events, facilities, locations, etc.

Model Features

Multi-entity Type Support

Capable of recognizing 10 different entity types, including dates, locations, persons, etc.

Efficient and Lightweight

Based on the DistilBERT architecture, it reduces computational resource requirements while maintaining high performance.

Hybrid Dataset Training

Trained using three Persian NER datasets: ARMAN, PEYMA, and WikiANN.

Model Capabilities

Persian text entity recognition

Multi-category entity classification

Sequence labeling

Use Cases

Text Analysis

News Entity Extraction

Extract information such as person names, organization names, and locations from Persian news.

F1 score above 0.95

Social Media Analysis

Identify entity information in Persian social media content.

🚀 DistilbertNER

This model is fine - tuned for the Named Entity Recognition (NER) task. It uses a mixed NER dataset collected from ARMAN, [PEYMA](http://nsurl.org/2019 - 2/tasks/task - 7 - named - entity - recognition - ner - for - farsi/), and [WikiANN](https://elisa - ie.github.io/wikiann/). The dataset covers ten types of entities:

Date (DAT)
Event (EVE)
Facility (FAC)
Location (LOC)
Money (MON)
Organization (ORG)
Percent (PCT)
Person (PER)
Product (PRO)
Time (TIM)

✨ Features

Fine - tuned for NER on a diverse dataset.
Covers ten different types of entities.

📦 Installation

Installing requirements

pip install transformers

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer
from transformers import AutoModelForTokenClassification  # for pytorch
from transformers import TFAutoModelForTokenClassification  # for tensorflow
from transformers import pipeline


model_name_or_path = "HooshvareLab/distilbert-fa-zwnj-base-ner" 
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "در سال ۲۰۱۳ درگذشت و آندرتیکر و کین برای او مراسم یادبود گرفتند."

ner_results = nlp(example)
print(ner_results)

📚 Documentation

Dataset Information

Property	Details
Model Type	Distilbert fine - tuned for NER
Training Data	Mixed dataset from ARMAN, PEYMA, and WikiANN

	Records	B - DAT	B - EVE	B - FAC	B - LOC	B - MON	B - ORG	B - PCT	B - PER	B - PRO	B - TIM	I - DAT	I - EVE	I - FAC	I - LOC	I - MON	I - ORG	I - PCT	I - PER	I - PRO	I - TIM
Train	29133	1423	1487	1400	13919	417	15926	355	12347	1855	150	1947	5018	2421	4118	1059	19579	573	7699	1914	332
Valid	5142	267	253	250	2362	100	2651	64	2173	317	19	373	799	387	717	270	3260	101	1382	303	35
Test	6049	407	256	248	2886	98	3216	94	2646	318	43	568	888	408	858	263	3967	141	1707	296	78

Evaluation

The following tables summarize the scores obtained by the model overall and per each class.

Overall

Model	Accuracy	Precision	Recall	F1
Distilbert	0.994534	0.946326	0.95504	0.950663

Per entities

	number	precision	recall	f1
DAT	407	0.812048	0.828010	0.819951
EVE	256	0.955056	0.996094	0.975143
FAC	248	0.972549	1.000000	0.986083
LOC	2884	0.968403	0.967060	0.967731
MON	98	0.925532	0.887755	0.906250
ORG	3216	0.932095	0.951803	0.941846
PCT	94	0.936842	0.946809	0.941799
PER	2645	0.959818	0.957278	0.958546
PRO	318	0.963526	0.996855	0.979907
TIM	43	0.760870	0.813953	0.786517

❓ Questions?

Post a Github issue on the ParsNER Issues repo.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご