đ distilroberta-base-ner-conll2003
This model is a fine - tuned version of distilroberta - base on the conll2003 dataset. It is designed for token - classification tasks, offering high performance in named entity recognition.
đ Quick Start
Model Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("philschmid/distilroberta-base-ner-conll2003")
model = AutoModelForTokenClassification.from_pretrained("philschmid/distilroberta-base-ner-conll2003")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True)
example = "My name is Philipp and live in Germany"
nlp(example)
⨠Features
- High - performance Metrics: Achieves excellent precision, recall, F1 - score, and accuracy on the conll2003 dataset.
- Fine - tuned Model: Based on the well - known
distilroberta - base
model, fine - tuned for token - classification tasks.
đĻ Installation
Since this is a Hugging Face model, you can use the following command to install the necessary libraries:
pip install transformers datasets tokenizers torch
đ Documentation
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4.9902376275441704e - 05
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 6.0
- mixed_precision_training: Native AMP
Training results
CoNNL2003
It achieves the following results on the evaluation set:
- Loss: 0.0583
- Precision: 0.9493
- Recall: 0.9566
- F1: 0.9529
- Accuracy: 0.9883
It achieves the following results on the test set:
- Loss: 0.2025
- Precision: 0.8999
- Recall: 0.915
- F1: 0.9074
- Accuracy: 0.9741
CoNNL++ / CoNLL2003 corrected
It achieves the following results on the evaluation set:
- Loss: 0.0567
- Precision: 0.9493
- Recall: 0.9566
- F1: 0.9529
- Accuracy: 0.9883
It achieves the following results on the test set:
- Loss: 0.1359
- Precision: 0.92
- Recall: 0.9245
- F1: 0.9223
- Accuracy: 0.9785
Framework versions
- Transformers 4.6.1
- Pytorch 1.8.1+cu101
- Datasets 1.6.2
- Tokenizers 0.10.2
đ License
This model is licensed under the Apache 2.0 license.
đ Model Metrics
Property |
Details |
Model Type |
Token Classification |
Training Data |
conll2003 |
Precision (CoNLL2003 - Eval) |
0.9492923423001218 |
Recall (CoNLL2003 - Eval) |
0.9565545901020023 |
F1 (CoNLL2003 - Eval) |
0.9529096297690173 |
Accuracy (CoNLL2003 - Eval) |
0.9883096560400111 |
Precision (CoNLL2003 - Validation) |
0.9906910190038265 |
Recall (CoNLL2003 - Validation) |
0.9916635820847483 |
F1 (CoNLL2003 - Validation) |
0.9911770619696786 |
Accuracy (CoNLL2003 - Validation) |
0.9883249976987512 |
Loss (CoNLL2003 - Validation) |
0.05638007074594498 |