tner-xlm-roberta-base-ontonotes5 Open Source Model - Free Support for English Text Entity Tagging and Classification

Home

Tner Xlm Roberta Base Ontonotes5

Developed by asahi417

A named entity recognition model fine-tuned on XLM-RoBERTa, supporting token classification tasks in English text.

Sequence Labeling

Transformers

English#Multilingual NER #Token Classification #XLM-RoBERTa Fine-tuning

Downloads 17.30k

Release Time : 3/2/2022

Model Overview

This model is a Named Entity Recognition (NER) model fine-tuned on the XLM-RoBERTa architecture, specifically designed to identify and classify named entities (such as person names, organization names, locations, etc.) in text.

Model Features

Multilingual Pretraining Foundation

Based on the XLM-RoBERTa architecture, it possesses strong multilingual understanding capabilities

Entity Classification Capability

Capable of identifying and classifying various entity types in text, such as person names (PER), organization names (ORG), and locations (LOC)

Easy Integration

Can be used in conjunction with the tner library for easy deployment in practical applications

Model Capabilities

Token Classification

Named Entity Recognition

English Text Processing

Use Cases

Information Extraction

News Article Entity Extraction

Extract key information such as person names, organization names, and locations from news articles

Social Media Analysis

Analyze entities mentioned in social media text

Knowledge Graph Construction

Knowledge Graph Entity Recognition

Provide entity recognition support for knowledge graph construction

🚀 XLM-RoBERTa for NER Model Card

XLM-RoBERTa fine-tuned for Named Entity Recognition (NER), offering high - performance token classification capabilities.

🚀 Quick Start

Use the code below to get started with the model.

Click to expand

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("asahi417/tner-xlm-roberta-base-ontonotes5")

model = AutoModelForTokenClassification.from_pretrained("asahi417/tner-xlm-roberta-base-ontonotes5")

✨ Features

Token Classification: The model is fine - tuned for token classification tasks in the context of Named Entity Recognition.
Compatibility: Can be used in conjunction with the tner library.

📚 Documentation

Model Details

Model Description

XLM-RoBERTa fine - tuned on NER.

Property	Details
Developed by	Asahi Ushio
Shared by [Optional]	Hugging Face
Model Type	Token Classification
Language(s) (NLP)	en
License	More information needed
Related Models	XLM - RoBERTa Parent Model: XLM - RoBERTa
Resources for more information	GitHub Repo Associated Paper Space

Uses

Direct Use

Token Classification

Downstream Use [Optional]

This model can be used in conjunction with the tner library.

Out - of - Scope Use

The model should not be used to intentionally create hostile or alienating environments for people.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

⚠️ Important Note

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Details

Training Data

An NER dataset contains a sequence of tokens and tags for each split (usually train/validation/test),

{
    'train': {
        'tokens': [
            ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.'],
            ['From', 'Green', 'Newsfeed', ':', 'AHFA', 'extends', 'deadline', 'for', 'Sage', 'Award', 'to', 'Nov', '.', '5', 'http://tinyurl.com/24agj38'], ...
        ],
        'tags': [
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], ...
        ]
    },
    'validation': ...,
    'test': ...,
}

with a dictionary to map a label to its index (label2id) as below.

{"O": 0, "B-ORG": 1, "B-MISC": 2, "B-PER": 3, "I-PER": 4, "B-LOC": 5, "I-ORG": 6, "I-MISC": 7, "I-LOC": 8}

Training Procedure

Preprocessing

More information needed

Speeds, Sizes, Times

Property	Details
Layer_norm_eps	1e - 05
Num_attention_heads	12
Num_hidden_layers	12
Vocab_size	250002

Evaluation

Testing Data, Factors & Metrics

Testing Data

See dataset card for full dataset lists

Factors

More information needed

Metrics

More information needed

Results

More information needed

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Property	Details
Hardware Type	More information needed
Hours used	More information needed
Cloud Provider	More information needed
Compute Region	More information needed
Carbon Emitted	More information needed

Citation

BibTeX:

@inproceedings{ushio-camacho-collados-2021-ner,
    title = "{T}-{NER}: An All-Round Python Library for Transformer-based Named Entity Recognition",
    author = "Ushio, Asahi  and
      Camacho-Collados, Jose",
    booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.eacl-demos.7",
    pages = "53--62",
}

Model Card Authors [Optional]

Asahi Ushio in collaboration with Ezi Ozoani and the Hugging Face team.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご