🚀 DistilbertNER
This model is fine - tuned for the Named Entity Recognition (NER) task. It uses a mixed NER dataset collected from ARMAN, [PEYMA](http://nsurl.org/2019 - 2/tasks/task - 7 - named - entity - recognition - ner - for - farsi/), and [WikiANN](https://elisa - ie.github.io/wikiann/). The dataset covers ten types of entities:
- Date (DAT)
- Event (EVE)
- Facility (FAC)
- Location (LOC)
- Money (MON)
- Organization (ORG)
- Percent (PCT)
- Person (PER)
- Product (PRO)
- Time (TIM)
✨ Features
- Fine - tuned for NER on a diverse dataset.
- Covers ten different types of entities.
📦 Installation
Installing requirements
pip install transformers
💻 Usage Examples
Basic Usage
from transformers import AutoTokenizer
from transformers import AutoModelForTokenClassification
from transformers import TFAutoModelForTokenClassification
from transformers import pipeline
model_name_or_path = "HooshvareLab/distilbert-fa-zwnj-base-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "در سال ۲۰۱۳ درگذشت و آندرتیکر و کین برای او مراسم یادبود گرفتند."
ner_results = nlp(example)
print(ner_results)
📚 Documentation
Dataset Information
Property |
Details |
Model Type |
Distilbert fine - tuned for NER |
Training Data |
Mixed dataset from ARMAN, PEYMA, and WikiANN |
|
Records |
B - DAT |
B - EVE |
B - FAC |
B - LOC |
B - MON |
B - ORG |
B - PCT |
B - PER |
B - PRO |
B - TIM |
I - DAT |
I - EVE |
I - FAC |
I - LOC |
I - MON |
I - ORG |
I - PCT |
I - PER |
I - PRO |
I - TIM |
Train |
29133 |
1423 |
1487 |
1400 |
13919 |
417 |
15926 |
355 |
12347 |
1855 |
150 |
1947 |
5018 |
2421 |
4118 |
1059 |
19579 |
573 |
7699 |
1914 |
332 |
Valid |
5142 |
267 |
253 |
250 |
2362 |
100 |
2651 |
64 |
2173 |
317 |
19 |
373 |
799 |
387 |
717 |
270 |
3260 |
101 |
1382 |
303 |
35 |
Test |
6049 |
407 |
256 |
248 |
2886 |
98 |
3216 |
94 |
2646 |
318 |
43 |
568 |
888 |
408 |
858 |
263 |
3967 |
141 |
1707 |
296 |
78 |
Evaluation
The following tables summarize the scores obtained by the model overall and per each class.
Overall
Model |
Accuracy |
Precision |
Recall |
F1 |
Distilbert |
0.994534 |
0.946326 |
0.95504 |
0.950663 |
Per entities
|
number |
precision |
recall |
f1 |
DAT |
407 |
0.812048 |
0.828010 |
0.819951 |
EVE |
256 |
0.955056 |
0.996094 |
0.975143 |
FAC |
248 |
0.972549 |
1.000000 |
0.986083 |
LOC |
2884 |
0.968403 |
0.967060 |
0.967731 |
MON |
98 |
0.925532 |
0.887755 |
0.906250 |
ORG |
3216 |
0.932095 |
0.951803 |
0.941846 |
PCT |
94 |
0.936842 |
0.946809 |
0.941799 |
PER |
2645 |
0.959818 |
0.957278 |
0.958546 |
PRO |
318 |
0.963526 |
0.996855 |
0.979907 |
TIM |
43 |
0.760870 |
0.813953 |
0.786517 |
❓ Questions?
Post a Github issue on the ParsNER Issues repo.