xlm-roberta-large-ner open-source named entity recognition model - Free to recognize multiple types of entities in Uzbek text

Xlm Roberta Large Ner

Developed by risqaliyevds

A named entity recognition model specifically designed for Uzbek text, built on the XLM-RoBERTa large architecture, supporting multiple entity category recognition.

Sequence Labeling

Transformers

OtherOpen Source License:MIT #Uzbek NER #Specialized for news text #Multi-entity recognition

Downloads 195

Release Time : 5/28/2024

Model Overview

This model can identify various named entity categories in Uzbek text, such as person names, place names, organization names, dates, etc., with high accuracy particularly for news texts.

Model Features

Multi-category entity recognition

Supports recognition of 18 different named entity categories, including person names, place names, organization names, dates, monetary amounts, etc.

News text optimization

The model is trained on the NEWS dataset, making it particularly suitable for named entity recognition tasks in news texts.

High accuracy

Demonstrates high recognition accuracy in Uzbek NER tasks.

Model Capabilities

Uzbek text processing

Named entity recognition

Multi-category entity classification

Use Cases

Text analysis

News text entity extraction

Extract key information such as person names, place names, and organization names from Uzbek news

Accurately identifies key entities in news

Document information extraction

Process Uzbek documents to extract named entity information

Structures key information in documents

Academic research

Linguistic research

Used for studying linguistic features and entity distribution in Uzbek

🚀 Named Entity Recognition (NER) Model for Uzbek Language

This model is designed to identify various named entities in Uzbek text, offering high accuracy and a wide range of recognized categories.

🚀 Quick Start

This model is crafted for Named Entity Recognition (NER) in Uzbek text. It can identify diverse categories of named entities, such as persons, places, organizations, dates, etc. Based on the XLM - RoBERTa large architecture, it's trained on the NEWS dataset and shows high accuracy for NER in NEWS texts.

✨ Features

💻 Usage Examples

Basic Usage

from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

model_name_or_path = "risqaliyevds/xlm-roberta-large-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path).to("cuda")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

text = "Shavkat Mirziyoyev Rossiyada rasmiy safarda bo'ldi."
ner = nlp(text)

for entity in ner:
    print(entity)

Example text: "Shavkat Mirziyoyev Rossiyada rasmiy safarda bo'ldi."

Results:

[{'entity': 'B - PERSON', 'score': 0.88995147, 'index': 1, 'word': '▁Shavkat', 'start': 0, 'end': 7},
 {'entity': 'I - PERSON', 'score': 0.980681, 'index': 2, 'word': '▁Mirziyoyev', 'start': 8, 'end': 18},
 {'entity': 'B - GPE', 'score': 0.8208886, 'index': 3, 'word': '▁Rossiya', 'start': 19, 'end': 26}]

Advanced Usage

from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

model_name_or_path = "risqaliyevds/xlm-roberta-large-ner"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path).to("cuda")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

📚 Documentation

Note!!!

The model is trained on the NEWS dataset and primarily has high accuracy for identifying NER in NEWS texts.

📄 License

This model is provided as open source and is available for free use by all users.

📞 Contact

If you have any questions or need more information, please contact us. LinkedIn: Riskaliev Murad

🌟 Conclusion

The NER model for the Uzbek language is effective in identifying various named entities in texts. Its high accuracy and wide range of categories make it useful for academic research, document analysis, and many other fields.

Property	Details
Model Type	Named Entity Recognition (NER) Model for Uzbek Language
Training Data	NEWS dataset
Metrics	accuracy
Pipeline Tag	token - classification
Tags	ner, uzbek_ner, ner_for_uzbek_language

⚠️ Important Note

The model is trained on the NEWS dataset and primarily has high accuracy for identifying NER in NEWS texts.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご