distilroberta-base-ner-conll2003 Open-Source Named Entity Recognition Model

Home

Distilroberta Base Ner Conll2003

Developed by philschmid

Named entity recognition model fine-tuned on the CoNLL2003 dataset based on distilroberta-base

Sequence Labeling

Transformers

Open Source License:Apache-2.0 #High-precision NER #Multilingual Entity Recognition #Lightweight RoBERTa

Downloads 103

Release Time : 3/2/2022

Model Overview

This model is designed for token-level named entity recognition tasks, excelling on the CoNLL-2003 dataset, particularly suitable for entity recognition in English texts.

Model Features

High Performance

Achieves an F1 score of 95.29 on the CoNLL-2003 dataset, demonstrating excellent entity recognition capabilities

Lightweight

Based on the DistilRoBERTa architecture, more lightweight and efficient compared to the full RoBERTa model

Verified Metrics

All performance metrics are verified, providing reliable evaluation results

Model Capabilities

Named Entity Recognition

Text Token Classification

English Text Processing

Use Cases

Information Extraction

News Entity Recognition

Identify entities such as person names, locations, and organizations from news texts

F1 score reaches 95.29

Document Analysis

Process key entity information in legal or business documents

🚀 distilroberta-base-ner-conll2003

This model is a fine - tuned version of distilroberta - base on the conll2003 dataset. It is designed for token - classification tasks, offering high performance in named entity recognition.

🚀 Quick Start

Model Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("philschmid/distilroberta-base-ner-conll2003")
model = AutoModelForTokenClassification.from_pretrained("philschmid/distilroberta-base-ner-conll2003")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True)
example = "My name is Philipp and live in Germany"

nlp(example)

✨ Features

High - performance Metrics: Achieves excellent precision, recall, F1 - score, and accuracy on the conll2003 dataset.
Fine - tuned Model: Based on the well - known distilroberta - base model, fine - tuned for token - classification tasks.

📦 Installation

Since this is a Hugging Face model, you can use the following command to install the necessary libraries:

pip install transformers datasets tokenizers torch

📚 Documentation

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 4.9902376275441704e - 05
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 6.0
mixed_precision_training: Native AMP

Training results

CoNNL2003

It achieves the following results on the evaluation set:

Loss: 0.0583
Precision: 0.9493
Recall: 0.9566
F1: 0.9529
Accuracy: 0.9883

It achieves the following results on the test set:

Loss: 0.2025
Precision: 0.8999
Recall: 0.915
F1: 0.9074
Accuracy: 0.9741

CoNNL++ / CoNLL2003 corrected

It achieves the following results on the evaluation set:

Loss: 0.0567
Precision: 0.9493
Recall: 0.9566
F1: 0.9529
Accuracy: 0.9883

It achieves the following results on the test set:

Loss: 0.1359
Precision: 0.92
Recall: 0.9245
F1: 0.9223
Accuracy: 0.9785

Framework versions

Transformers 4.6.1
Pytorch 1.8.1+cu101
Datasets 1.6.2
Tokenizers 0.10.2

📄 License

This model is licensed under the Apache 2.0 license.

📊 Model Metrics

Property	Details
Model Type	Token Classification
Training Data	conll2003
Precision (CoNLL2003 - Eval)	0.9492923423001218
Recall (CoNLL2003 - Eval)	0.9565545901020023
F1 (CoNLL2003 - Eval)	0.9529096297690173
Accuracy (CoNLL2003 - Eval)	0.9883096560400111
Precision (CoNLL2003 - Validation)	0.9906910190038265
Recall (CoNLL2003 - Validation)	0.9916635820847483
F1 (CoNLL2003 - Validation)	0.9911770619696786
Accuracy (CoNLL2003 - Validation)	0.9883249976987512
Loss (CoNLL2003 - Validation)	0.05638007074594498

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご