🚀 Named Entity Recognition (NER) Model for Uzbek Language
This model is designed to identify various named entities in Uzbek text, offering high accuracy and a wide range of recognized categories.
🚀 Quick Start
This model is crafted for Named Entity Recognition (NER) in Uzbek text. It can identify diverse categories of named entities, such as persons, places, organizations, dates, etc. Based on the XLM - RoBERTa large architecture, it's trained on the NEWS dataset and shows high accuracy for NER in NEWS texts.
✨ Features
Categories
The model can recognize the following NER categories:
- LOC (Location names)
- ORG (Organization names)
- PERSON (Person names)
- DATE (Date expressions)
- MONEY (Monetary amounts)
- PERCENT (Percentage values)
- QUANTITY (Quantities)
- TIME (Time expressions)
- PRODUCT (Product names)
- EVENT (Event names)
- WORK_OF_ART (Work of art titles)
- LANGUAGE (Language names)
- CARDINAL (Cardinal numbers)
- ORDINAL (Ordinal numbers)
- NORP (Nationalities or religious/political groups)
- FACILITY (Facility names)
- LAW (Laws or regulations)
- GPE (Countries, cities, states)
💻 Usage Examples
Basic Usage
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
model_name_or_path = "risqaliyevds/xlm-roberta-large-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path).to("cuda")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Shavkat Mirziyoyev Rossiyada rasmiy safarda bo'ldi."
ner = nlp(text)
for entity in ner:
print(entity)
Example text: "Shavkat Mirziyoyev Rossiyada rasmiy safarda bo'ldi."
Results:
[{'entity': 'B - PERSON', 'score': 0.88995147, 'index': 1, 'word': '▁Shavkat', 'start': 0, 'end': 7},
{'entity': 'I - PERSON', 'score': 0.980681, 'index': 2, 'word': '▁Mirziyoyev', 'start': 8, 'end': 18},
{'entity': 'B - GPE', 'score': 0.8208886, 'index': 3, 'word': '▁Rossiya', 'start': 19, 'end': 26}]
Advanced Usage
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
model_name_or_path = "risqaliyevds/xlm-roberta-large-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path).to("cuda")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
📚 Documentation
Note!!!
The model is trained on the NEWS dataset and primarily has high accuracy for identifying NER in NEWS texts.
📄 License
This model is provided as open source and is available for free use by all users.
📞 Contact
If you have any questions or need more information, please contact us.
LinkedIn: Riskaliev Murad
🌟 Conclusion
The NER model for the Uzbek language is effective in identifying various named entities in texts. Its high accuracy and wide range of categories make it useful for academic research, document analysis, and many other fields.
Property |
Details |
Model Type |
Named Entity Recognition (NER) Model for Uzbek Language |
Training Data |
NEWS dataset |
Metrics |
accuracy |
Pipeline Tag |
token - classification |
Tags |
ner, uzbek_ner, ner_for_uzbek_language |
⚠️ Important Note
The model is trained on the NEWS dataset and primarily has high accuracy for identifying NER in NEWS texts.