rured2-ner-microsoft-mdeberta-v3-base Open-source Model - Accurately Perform Russian Named Entity Recognition!

Rured2 Ner Microsoft Mdeberta V3 Base

Developed by denis-gordeev

A Russian named entity recognition model fine-tuned on microsoft/mdeberta-v3-base, supporting single-token multi-label output

Sequence Labeling

Transformers

OtherOpen Source License:MIT #Russian Multi-label NER #Entity Recognition Fine-tuning #Commercial Alternative Analysis

Downloads 132

Release Time : 11/15/2023

Model Overview

This model is a multi-label named entity recognition (NER) model for Russian text, fine-tuned on the RURED2 dataset, capable of identifying multiple entity types in text.

Model Features

Multi-label Output

Supports single-token multi-label output, enabling the recognition of a word belonging to multiple entity types simultaneously

Russian Language Optimization

A named entity recognition model specifically optimized for Russian text

Based on mdeberta-v3-base

Fine-tuned on the powerful multilingual DeBERTa model, with excellent contextual understanding capabilities

Model Capabilities

Russian Text Analysis

Named Entity Recognition

Multi-label Classification

Use Cases

News Analysis

News Entity Extraction

Extracting entities such as person names, place names, and organization names from Russian news

Successfully identified brand names (Perspective, Ketroy, Mexx) and company names (Chita Spring) in the example

Business Intelligence

Brand Monitoring

Tracking brands and companies mentioned in Russian media

Capable of identifying alternative brands and local product information

Legal & Security

Crime Report Analysis

Extracting information about involved individuals and locations from police reports

Identified crime locations (Novosibirsk) and suspect identities (Tomsk resident) in the example

🚀 denis-gordeev/rured2-ner-microsoft-mdeberta-v3-base

This is a Russian NER model fine-tuned on RURED2. It's a fine-tuned version of microsoft/mdeberta-v3-base, aiming to output multiple possible labels for a single token.

🚀 Quick Start

If you have any questions, you can message the author at https://t.me/nlp_party.

Here is an example of how to use this model:

import torch
from torch import nn
from transformers import (AutoTokenizer, AutoModelForTokenClassification, 
                          TrainingArguments, Trainer)

model_name = "denis-gordeev/rured2-ner-microsoft-mdeberta-v3-base"
model = AutoModelForTokenClassification.from_pretrained(
    model_name).to('cuda')

tokenizer = AutoTokenizer.from_pretrained(model_name)

def predict(text:str, glue_tokens=False, output_together=True, glue_words=True):
    sigmoid = nn.Sigmoid()
    tokenized = tokenizer(text)
    input_ids = torch.tensor(
            [tokenized["input_ids"]], dtype=torch.long
        ).to("cuda")
    token_type_ids = torch.tensor(
            [tokenized["token_type_ids"]], dtype=torch.long
        ).to("cuda")
    attention_mask = torch.tensor(
            [tokenized["attention_mask"]], dtype=torch.long
        ).to("cuda")
    preds = model(**{"input_ids": input_ids, "token_type_ids": token_type_ids, "attention_mask": attention_mask})
    logits = sigmoid(preds.logits)

    output_tokens = []
    output_preds = []
    id_to_label = {int(k): v for k, v in model.config.id2label.items()}
    for i, token in enumerate(input_ids[0]):
        if token > 3:
            class_ids = (logits[0][i] > 0.5).nonzero()
            if class_ids.shape[0] >= 1:
                class_names = [id_to_label[int(cl)] for cl in class_ids]
            else:
                class_names = [id_to_label[int(logits[0][i].argmax())]]
            converted_token = tokenizer.convert_ids_to_tokens([token])[0]
            new_word_bool = converted_token.startswith("▁")
            converted_token = converted_token.replace("▁", "")
            if glue_words and not(new_word_bool) and output_tokens:
                output_tokens[-1] += converted_token
            else:
                output_tokens.append(converted_token)
                output_preds.append(class_names)
        else:
            class_names = []
    if output_together:
        return [[output_tokens[t_i], output_preds[t_i]] for t_i in range(len(output_tokens))]
    return output_tokens, output_preds

📚 Documentation

This model is a fine-tuned version of microsoft/mdeberta-v3-base on the None dataset. It achieves the following results on the evaluation set:

Property	Details
Loss	0.0096
F1 Micro	0.5837
O F1 Micro	0.6370
O Recall Micro	0.9242
O Precision Micro	0.4860
B-person F1 Micro	0.9639
B-person Recall Micro	0.9816
B-person Precision Micro	0.9468
B-norp F1 Micro	0.6190
B-norp Recall Micro	0.8667
B-norp Precision Micro	0.4815
B-commodity F1 Micro	0.7553
B-commodity Recall Micro	0.9470
B-commodity Precision Micro	0.6281
B-date F1 Micro	0.8386
B-date Recall Micro	0.8471
B-date Precision Micro	0.8304
I-date F1 Micro	0.6419
I-date Recall Micro	0.9492
I-date Precision Micro	0.4849
B-country F1 Micro	0.6152
B-country Recall Micro	0.9765
B-country Precision Micro	0.4490
B-economic Sector F1 Micro	0.5576
B-economic Sector Recall Micro	0.5897
B-economic Sector Precision Micro	0.5287
I-economic Sector F1 Micro	0.2517
I-economic Sector Recall Micro	0.6667
I-economic Sector Precision Micro	0.1551
B-news Source F1 Micro	0.7988
B-news Source Recall Micro	0.8327
B-news Source Precision Micro	0.7677
B-profession F1 Micro	0.8088
B-profession Recall Micro	0.9464
B-profession Precision Micro	0.7061
I-news Source F1 Micro	0.4808
I-news Source Recall Micro	0.8400
I-news Source Precision Micro	0.3368
I-person F1 Micro	0.3381
I-person Recall Micro	0.996
I-person Precision Micro	0.2036
B-organization F1 Micro	0.8350
B-organization Recall Micro	0.8993
B-organization Precision Micro	0.7794
I-profession F1 Micro	0.2462
I-profession Recall Micro	0.8030
I-profession Precision Micro	0.1454
B-event F1 Micro	0.5658
B-event Recall Micro	0.5436
B-event Precision Micro	0.5899
B-city F1 Micro	0.625
B-city Recall Micro	0.8904
B-city Precision Micro	0.4815
B-gpe F1 Micro	0.6760
B-gpe Recall Micro	0.9380
B-gpe Precision Micro	0.5284
I-event F1 Micro	0.2577
I-event Recall Micro	0.3776
I-event Precision Micro	0.1956
B-group F1 Micro	0.6667
B-group Recall Micro	0.75
B-group Precision Micro	0.6
B-ordinal F1 Micro	0.5306
B-ordinal Recall Micro	0.8125
B-ordinal Precision Micro	0.3939
B-product F1 Micro	0.6683
B-product Recall Micro	0.8232
B-product Precision Micro	0.5625
I-organization F1 Micro	0.3128
I-organization Recall Micro	0.8425
I-organization Precision Micro	0.1921
B-money F1 Micro	0.8530
B-money Recall Micro	0.8947
B-money Precision Micro	0.8151
I-money F1 Micro	0.6259
I-money Recall Micro	0.9644
I-money Precision Micro	0.4632
B-currency F1 Micro	0.7441
B-currency Recall Micro	0.9658
B-currency Precision Micro	0.6052
B-percent F1 Micro	0.8639
B-percent Recall Micro	0.8902
B-percent Precision Micro	0.8391
I-percent F1 Micro	0.6995
I-percent Recall Micro	0.9846
I-percent Precision Micro	0.5424
I-group F1 Micro	0.1844
I-group Recall Micro	0.4836
I-group Precision Micro	0.1139
B-cardinal F1 Micro	0.6903
B-cardinal Recall Micro	0.7358
B-cardinal Precision Micro	0.65
B-law F1 Micro	0.3704
B-law Recall Micro	0.3571
B-law Precision Micro	0.3846
I-law F1 Micro	0.3246
I-law Recall Micro	0.3936
I-law Precision Micro	0.2761
B-fac F1 Micro	0.6910
B-fac Recall Micro	0.6910
B-fac Precision Micro	0.6910
I-fac F1 Micro	0.3007
I-fac Recall Micro	0.7151
I-fac Precision Micro	0.1904
B-age F1 Micro	0.8649
B-age Recall Micro	0.7619
B-age Precision Micro	1.0
I-city F1 Micro	0.1047
I-city Recall Micro	0.6429
I-city Precision Micro	0.0570
B-work Of Art F1 Micro	0.3158
B-work Of Art Recall Micro	0.375
B-work Of Art Precision Micro	0.2727
I-work Of Art F1 Micro	0.3721
I-work Of Art Recall Micro	0.5
I-work Of Art Precision Micro	0.2963
B-region F1 Micro	0.8070
B-region Recall Micro	0.7731
B-region Precision Micro	0.8440
I-region F1 Micro	0.2817
I-region Recall Micro	0.8197
I-region Precision Micro	0.1701
I-cardinal F1 Micro	0.3851
I-cardinal Recall Micro	0.4831
I-cardinal Precision Micro	0.3202
I-currency F1 Micro	0.0
I-currency Recall Micro	0.0
I-currency Precision Micro	0.0
B-quantity F1 Micro	0.7311
B-quantity Recall Micro	0.7311
B-quantity Precision Micro	0.7311
I-quantity F1 Micro	0.4889
I-quantity Recall Micro	0.7989
I-quantity Precision Micro	0.3522
B-crime F1 Micro	0.3736
B-crime Recall Micro	0.4048
B-crime Precision Micro	0.3469
I-crime F1 Micro	0.3245
I-crime Recall Micro	0.5648
I-crime Precision Micro	0.2276
B-trade Agreement F1 Micro	0.7170
B-trade Agreement Recall Micro	0.7037
B-trade Agreement Precision Micro	0.7308
B-nationality F1 Micro	0.0
B-nationality Recall Micro	0.0
B-nationality Precision Micro	0.0
B-family F1 Micro	0.5
B-family Recall Micro	0.8889
B-family Precision Micro	0.3478
I-family F1 Micro	0.0
I-family Recall Micro	0.0
I-family Precision Micro	0.0
I-product F1 Micro	0.2021
I-product Recall Micro	0.6824
I-product Precision Micro	0.1186
B-time F1 Micro	0.6538
B-time Recall Micro	0.6296
B-time Precision Micro	0.68
I-time F1 Micro	0.6118
I-time Recall Micro	0.9811
I-time Precision Micro	0.4444
I-commodity F1 Micro	0.0444
I-commodity Recall Micro	0.1667
I-commodity Precision Micro	0.0256
B-application F1 Micro	0.0
B-application Recall Micro	0.0
B-application Precision Micro	0.0
I-application F1 Micro	0.0
I-application Recall Micro	0.0
I-application Precision Micro	0.0
I-country F1 Micro	0.1695
I-country Recall Micro	0.7895
I-country Precision Micro	0.0949
B-award F1 Micro	0.5455
B-award Recall Micro	0.4615
B-award Precision Micro	0.6667
I-award F1 Micro	0.4459
I-award Recall Micro	0.8049
I-award Precision Micro	0.3084
I-gpe F1 Micro	0.3284
I-gpe Recall Micro	0.9167
I-gpe Precision Micro	0.2
B-location F1 Micro	0.4885
B-location Recall Micro	0.5161
B-location Precision Micro	0.4638
I-location F1 Micro	0.3189
I-location Recall Micro	0.6316
I-location Precision Micro	0.2133
I-ordinal F1 Micro	0.5
I-ordinal Recall Micro	0.4
I-ordinal Precision Micro	0.6667
I-trade Agreement F1 Micro	0.1163
I-trade Agreement Recall Micro	0.3846
I-trade Agreement Precision Micro	0.0685
B-religion F1 Micro	0.0
B-religion Recall Micro	0.0
B-religion Precision Micro	0.0
I-age F1 Micro	0.4324
I-age Recall Micro	0.5714
I-age Precision Micro	0.3478
B-investment Program F1 Micro	0.0
B-investment Program Recall Micro	0.0
B-investment Program Precision Micro	0.0
I-investment Program F1 Micro	0.0
I-investment Program Recall Micro	0.0
I-investment Program Precision Micro	0.0
B-borough F1 Micro	0.7059
B-borough Recall Micro	0.6667
B-borough Precision Micro	0.75
B-price F1 Micro	0.0
B-price Recall Micro	0.0
B-price Precision Micro	0.0
I-price F1 Micro	0.0
I-price Recall Micro	0.0
I-price Precision Micro	0.0
B-character F1 Micro	0.0
B-character Recall Micro	0.0
B-character Precision Micro	0.0
I-character F1 Micro	0.0
I-character Recall Micro	0.0
I-character Precision Micro	0.0
B-website F1 Micro	0.0
B-website Recall Micro	0.0
B-website Precision Micro	0.0
B-street F1 Micro	0.4000
B-street Recall Micro	0.4286
B-street Precision Micro	0.375
I-street F1 Micro	0.3256
I-street Recall Micro	1.0
I-street Precision Micro	0.1944
B-village F1 Micro	0.6667
B-village Recall Micro	0.7
B-village Precision Micro	0.6364
I-village F1 Micro	0.2222
I-village Recall Micro	0.875
I-village Precision Micro	0.1273
B-disease F1 Micro	0.5965
B-disease Recall Micro	0.7083
B-disease Precision Micro	0.5152
I-disease F1 Micro	0.3704
I-disease Recall Micro	0.7812
I-disease Precision Micro	0.2427
B-penalty F1 Micro	0.1579
B-penalty Recall Micro	0.1579
B-penalty Precision Micro	0.1579
I-penalty F1 Micro	0.1674
I-penalty Recall Micro	0.3175
I-penalty Precision Micro	0.1136
B-weapon F1 Micro	0.6715
B-weapon Recall Micro	0.7302
B-weapon Precision Micro	0.6216
I-weapon F1 Micro	0.2455
I-weapon Recall Micro	0.5965
I-weapon Precision Micro	0.1545
I-borough F1 Micro	0.4091
I-borough Recall Micro	0.6923
I-borough Precision Micro	0.2903
B-vehicle F1 Micro	0.6349
B-vehicle Recall Micro	0.5882
B-vehicle Precision Micro	0.6897
I-vehicle F1 Micro	0.4174
I-vehicle Recall Micro	0.7273


## 📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご